Enzyme: Incremental View Maintenance for Data Engineering

Ritwik Yadav; Supun Abeysinghe; Min Yang; Jeffrey Helt; Manuel Ung; Yuhong Chen; Melody Hu; William Wei; Yiming Yang; Tom van Bussel; Sourav Chatterji; Indrajit Roy; Paul Lappas; Yannis Papakonstantinou; Tahir Fayyaz; Bilal Aslam; Ross Bunker; Michael Armbrust; Shrikanth Shankar

doi:10.1145/3788853.3803098

What is it about?

Materialized views are precomputed query results that data teams rely on to speed up analytics and ETL pipelines. The hard part is keeping them current as the underlying data changes — recomputing them from scratch is expensive, but hand-tuning incremental updates is complex and error-prone. This paper presents Enzyme, the incremental view maintenance (IVM) engine that powers Databricks' declarative data pipelines. Enzyme treats materialized views as first-class building blocks and automatically plans how to refresh them, choosing per view between an incremental update and a full recomputation. It does this with a cost-based optimizer, built on Apache Spark, that estimates refresh cost from the system's own history of past executions.

Photo by Deng Xiang on Unsplash

Why is it important?

Most industrial IVM systems either support only a narrow set of SQL operators or force users to manually tune refresh strategies. Enzyme removes that burden: users write business logic and the engine handles the mechanics, lowering total cost of ownership. The impact is large and measured in production — across thousands of large-scale pipelines spanning diverse domains, Enzyme delivers a cumulative reduction of billions of CPU-seconds per day. On the standard TPC-DI benchmark it incrementalizes 100% of the workloads, its cost model picks the right strategy in 87.5% of cases, and its efficiency gains hold steady as data volume grows. It shows that automated, cost-based IVM works reliably at industrial scale in the Lakehouse era.

Perspectives

We built Enzyme to close the gap between decades of incremental-view-maintenance research and what data engineers actually need day to day. The interesting design challenge wasn't only generating efficient incremental plans — it was the surrounding system: deciding when incrementalization is worth it, grounding cost estimates in real historical executions rather than analytical models, and batching work across collections of views in a pipeline. Seeing the cost model reliably make the right call across production workloads was the most rewarding part.
Ritwik Yadav
Databricks, Inc.

This page is a summary of: Enzyme: Incremental View Maintenance for Data Engineering, May 2026, ACM (Association for Computing Machinery),
DOI: 10.1145/3788853.3803098.
You can read the full text:

Read

Resources

Related Content
Enzyme: Incremental View Maintenance for Data Engineering
Full paper on arxiv

Contributors

The following have contributed to this page

Ritwik Yadav
Databricks, Inc.

Enzyme: Automatically keeping data pipelines up to date without recomputing everything

What is it about?

Why is it important?

Perspectives

Resources