What is it about?
Materialized views are precomputed query results that data teams rely on to speed up analytics and ETL pipelines. The hard part is keeping them current as the underlying data changes — recomputing them from scratch is expensive, but hand-tuning incremental updates is complex and error-prone. This paper presents Enzyme, the incremental view maintenance (IVM) engine that powers Databricks' declarative data pipelines. Enzyme treats materialized views as first-class building blocks and automatically plans how to refresh them, choosing per view between an incremental update and a full recomputation. It does this with a cost-based optimizer, built on Apache Spark, that estimates refresh cost from the system's own history of past executions.
Featured Image
Photo by Deng Xiang on Unsplash
Why is it important?
Most industrial IVM systems either support only a narrow set of SQL operators or force users to manually tune refresh strategies. Enzyme removes that burden: users write business logic and the engine handles the mechanics, lowering total cost of ownership. The impact is large and measured in production — across thousands of large-scale pipelines spanning diverse domains, Enzyme delivers a cumulative reduction of billions of CPU-seconds per day. On the standard TPC-DI benchmark it incrementalizes 100% of the workloads, its cost model picks the right strategy in 87.5% of cases, and its efficiency gains hold steady as data volume grows. It shows that automated, cost-based IVM works reliably at industrial scale in the Lakehouse era.
Perspectives
We built Enzyme to close the gap between decades of incremental-view-maintenance research and what data engineers actually need day to day. The interesting design challenge wasn't only generating efficient incremental plans — it was the surrounding system: deciding when incrementalization is worth it, grounding cost estimates in real historical executions rather than analytical models, and batching work across collections of views in a pipeline. Seeing the cost model reliably make the right call across production workloads was the most rewarding part.
Ritwik Yadav
Databricks, Inc.
Read the Original
This page is a summary of: Enzyme: Incremental View Maintenance for Data Engineering, May 2026, ACM (Association for Computing Machinery),
DOI: 10.1145/3788853.3803098.
You can read the full text:
Resources
Contributors
The following have contributed to this page







