Backfill
Backfilling allows you to reprocess historical data in materialized tables when you need to fix data quality issues or update business logic. This guide explains the different backfill modes and how to use them effectively.
What is Backfilling?
When tables are materialized to storage (currently S3), there may be cases where you need to reprocess historical data due to:
- Data quality issues in the source data
- Updates to SQL transformations or business logic
- Changes in data that arrive late
Backfilling lets you selectively delete and recompute data for specific date ranges in materialized tables.
Prerequisites
Backfills are only supported for tables that use timestamp-based incremental materialization. This is because full refresh materialization has no concept of historical data ranges.
Backfill Modes
Daily Rolling Backfill
Some use cases require regular reprocessing of recent historical data. For example, if source data is updated for the past 5 days on a daily basis.
To handle rolling backfills, configure the interval_to_process_for_incremental_run parameter in your table configuration. This parameter specifies how many days of historical data to reprocess. For example, with a 5-day interval:
- Each incremental run will reprocess the last 5 days of data (e.g. Jan 7-11 when processing Jan 11)
- Previously processed dates within the interval are skipped to avoid duplicate processing
This ensures data stays fresh while avoiding unnecessary reprocessing.
One-Time Backfill
For ad-hoc backfills of specific date ranges, you can configure the backfill state:
Full Backfill
- Reprocesses all historical data from the table's first timestamp
- Use when you need to rebuild the entire table
Partial Backfill
- Reprocesses data for a specific date range
- Configure start and end timestamps
- Useful for fixing isolated data quality issues
Backfill Process
When a backfill runs:
-
The system determines the appropriate time ranges to process based on:
- Table's last processed timestamp
- Configured backfill state
- Incremental processing interval
-
After successful processing, backfill state is cleared
Best Practices
- Use rolling backfills for predictable data updates
- Configure partial backfills for targeted fixes
- Ensure backfill end time doesn't exceed last processed timestamp
Configuration
Backfills can be configured through UI interface in the data console