MERGE mechanism
This section describes the data update logic when replicating data to Google BigQuery using Renta ELT.
Data schema requirements
- Unique key (required).
Required for identifying and updating records. The key field depends on the source (e.g., Primary Key for PostgreSQL CDC or Lead ID for Salesforce). Unique key specifications for each connector are available in the Sources documentation section. - Partitioning key (optional).
The field by which the physical separation of data (partitioning) will be performed in the target BigQuery table.
Replication algorithm
The data merge process consists of three stages:
- Staging.
New and modified data is loaded into a temporary BigQuery table (naming format:target_table_name_renta_temp). - Deduplication.
Within the temporary table, the actual states are selected. Records are grouped by unique key and sorted by timestamp (or another system parameter of the source). Only the latest version of the row is transferred to the target table. - Merge.
Filtered data from the temporary table is transferred to the target table.
Impact of partitioning on MERGE performance
The presence of a partitioning key critically affects the volume of processed data and query cost:
- With partitioning (recommended).
The MERGE operation affects only those partitions of the target table for which data is present in the temporary table. This eliminates unnecessary data reading. - Without partitioning.
BigQuery performs a full scan of the target table. For large tables, this results in reduced performance and increased costs.
Immutability of the partitioning key
The value of the selected partitioning key must be constant on the source side. If a record moves from one partition to another (for example, the updated_at field used as the key changes), the standard MERGE with a filter on the new partition will create a duplicate in the new partition, while the outdated row remains in the old partition (orphan record).
Handling deletes
When records are deleted in the source, physical deletion of rows in BigQuery is not performed. Renta applies the soft delete method: for a deleted record, the system field is_delete is set to True.
Ready to get started?
Build your data pipeline today or get a personalized demo. Start free!
Need help?
Get expert support to ensure your project succeeds. We're here to help!
Feature requests?
Help shape our product! Share your ideas for new features and integrations.