Backfill
Backfilling allows you to reprocess historical data in materialized views when you need to fix data quality issues or update business logic. Instead of rebuilding an entire view from scratch, you can selectively delete and recompute data for specific date ranges.
When to backfill
Common reasons to run a backfill:
- Data quality issues were discovered in the source data.
- SQL transformation logic or business rules were updated.
- Late-arriving data needs to be incorporated.
Prerequisites
Backfills are only supported for views that use timestamp-based incremental materialization. Full refresh materialization has no concept of historical data ranges — it always rebuilds the entire table.
Backfill modes
Daily rolling backfill
Some use cases require regular reprocessing of recent historical data. For example, if source data is updated for the past 5 days on a daily basis.
To handle rolling backfills, configure the lookback window (interval_to_process_for_incremental_run) in your view's materialization settings. This parameter specifies how many days of historical data to reprocess on each run. For example, with a 5-day interval:
- Each incremental run will reprocess the last 5 days of data (e.g. Jan 7-11 when processing Jan 11).
- Previously processed dates within the interval are skipped to avoid duplicate processing.
This ensures data stays fresh while avoiding unnecessary reprocessing.
One-time backfill
For ad-hoc backfills of specific date ranges, configure the backfill state directly from the view's settings in the Data Catalog.
Full backfill — Reprocesses all historical data from the view's first timestamp. Use when you need to rebuild the entire table.
Partial backfill — Reprocesses data for a specific date range by configuring start and end timestamps. Useful for fixing isolated data quality issues without reprocessing everything.
Backfill process
When a backfill runs:
-
The system determines the appropriate time ranges to process based on:
- The view's last processed timestamp
- The configured backfill state
- The incremental processing interval
-
Data for the selected range is deleted and recomputed.
-
After successful processing, the backfill state is cleared.
Configuration
Backfills can be configured through the UI in the Data Catalog:
- Open the materialized view.
- Navigate to the backfill configuration section.
- Choose full or partial backfill.
- If partial, set the start and end timestamps.
- Save and trigger the refresh.
Best practices
- Use rolling backfills for predictable, recurring data updates.
- Configure partial backfills for targeted fixes to minimize processing time.
- Ensure the backfill end time does not exceed the last processed timestamp.
- Monitor backfill progress to confirm data quality is restored.
Next steps
- Learn about Materialization modes to choose the right refresh strategy
- Set up automated refreshes with Scheduling
- Add data quality tests to catch issues early