Incremental models in dbt allow you to process only new or updated rows in a table, significantly improving performance because you don't have to rebuild the entire table from scratch.

This approach accelerates transformations and reduces computational costs. However, setting up these models can be challenging for beginners.

Let’s dive in.

What are Incremental Models?

Unlike the traditional table materialization method, which rebuilds the entire table every time you run a job in dbt, incremental models only process new or updated rows in your dataset.

This approach minimizes the amount of data that needs to be transformed, reducing execution time and lowering the cost of running queries.

How It Works

Incremental models filter rows that have been added or updated based on a specific column. This is typically a timestamp or a unique key (e.g., an order number in integer format). In this context, I will refer to this key as a checkpoint.

The dbt job compares the checkpoint value in that column to the maximum value already present in the destination table.

For example, suppose you're working with a website's behavioural events table. In that case, the incremental model will only process new events with an event datetime greater than the most recent date in your existing data model.

Configuring an incremental Model in dbt

The first step when working with incremental models in dbt is to set the configuration model.

Specify the Materialization Type

At the start of the model, you must specify the materialization type as incremental. This is done using the config() block:

1 2 3 {{ config( materialized='incremental' ) }}

This tells dbt that the model should update incrementally, meaning it will add only new or updated rows during each run, rather than overwrite the entire table.

Create a dbt model with an incremental update setup

Next is the main SQL query that will be used to build the model. Here is a sample query from a real project:

1 2 3 4 5 6 7 8 9 10 11 12 {{ config( materialized='incremental' ) }} SELECT *, CURRENT_TIMESTAMP() AS batch_timestamp -- capture the time of download FROM {{ source('tracking_v2', 'renta_events') }} WHERE date >= '2025-03-01' -- base filter on date {% if is_incremental() %} AND dateTime > (SELECT MAX(dateTime) FROM {{ this }}) -- incremental condition {% endif %}

How is_incremental() works

The is_incremental() block is a built-in dbt macro function. It returns true if:

  1. The table already exists in the database.

  2. The model is not run with the --full-refresh flag.

  3. The materialization type is set to incremental.

If all these conditions are met, dbt applies an additional condition to the query:

1 AND dateTime > (SELECT MAX(dateTime) FROM {{ this }})

This condition compares the dateTime column in the source table with the maximum dateTime value in the destination table.

Run the dbt model

Run the dbt model using the dbt run command:

1 dbt run --select my_incremental_model

The model will first check whether the destination table exists and if dbt is running in full update mode.

If the table does not exist, dbt will treat it as a full table creation. If the table exists, dbt will just add new data.

What happens on the first run?

A new table is created. The data is completely reimplemented from the original table into the newly created model.

On subsequent runs, dbt will only add new rows. Tracking new records is most convenient with the added batch_timestamp field, which helps keep track of the data processed in each run. The batch_timestamp is a unique timestamp for each inserted batch. This allows you to keep track of which records were added during each run of the dbt job, which is useful for debugging the model.

Rebuild a model from scratch

Sometimes you may need to completely update a table, such as when you need to add new columns (though there is a better operation for this case).

To perform a full refresh, run the following command:

1 dbt run --full-refresh --select my_incremental_model

What this command does:

  1. Deletes the current version of the table.

  2. Creates the table anew.

  3. Loads all available data from the source table (without the incremental filter).

This command will drop the table, delete all existing data, create a new table with a new schema, and reload data from scratch.

Handling schema changes using incremental strategies

If the schema changes, dbt provides several options for how it should handle this via the on_schema_change configuration:

Strategy Description
ignore Default behavior. Changes to the schema are ignored.
append_new_columns Adds new columns to the table, but does not update them with values for existing rows.
fail Stops execution if the schema is not synchronized with the model.
sync_all_columns Adds new columns and removes deprecated columns to maintain synchronization.

This behavior depends on the data warehouse used (see the dbt documentation for details). 

Below is an example of how to set the on_schema_change configuration to your dbt model:

1 2 3 4 5 6 {{ config( materialized='incremental', unique_key='eventId', incremental_strategy='merge', on_schema_change='append_new_columns' ) }}

What's going on here?

Let’s break down the configuration in more detail:

  1. materialized='incremental': This means the model will be updated incrementally, only adding new rows.

  2. unique_key='eventId': dbt will use the eventId as the table's unique key. With this key, dbt will detect and merge duplicate rows.

  3. incremental_strategy='merge': The merge strategy ensures that new rows are added and existing rows are updated.

  4. on_schema_change='append_new_columns': If new columns (e.g. user_agent or payment_method) appear in the source table, they will be automatically added to the model without causing errors.

You should keep in mind that new columns added through on_schema_change will not update existing rows.

If you need to update these rows, you must run a model refresh with the --full-refresh flag.

Conclusion

I hope this material serves as a great starting point for exploring incremental updates in dbt. I also recommend checking out the resources below, which cover different incremental update configuration strategies:

If you have any questions, feel free to contact me on LinkedIn. If you're using dbt, consider integrating Renta ETL with dbt Cloud, which enhances control over the ingestion and management of your data pipelines.

Frequently asked questions