What is it about?

When databases process large amounts of data, they break it into chunks for efficiency. However, some operations, like filtering and joins, create small chunks, slowing down queries. Our research addresses this problem by introducing Logical Compaction, a technique that combines small chunks into larger, more efficient ones. We implemented this in DuckDB, a modern analytical database, and observed up to 40% faster query execution. This technique helps databases process data more smoothly, making analytics faster and more efficient.

Featured Image

Why is it important?

As data grows, databases must handle queries efficiently. Many databases struggle with the small chunk problem, leading to slower processing. Our solution, Logical Compaction, reduces overhead and improves performance, benefiting data analysts, engineers, and researchers. Faster queries mean quicker insights, making this work highly relevant for modern data-intensive applications.

Perspectives

This research was motivated by real-world performance issues in vectorized execution engines like DuckDB. While existing optimizations focus on individual operators, our approach enhances performance across multiple query stages. The success of Logical Compaction opens new possibilities for optimizing future database architectures.

Yiming Qiao

Read the Original

This page is a summary of: Data Chunk Compaction in Vectorized Execution, Proceedings of the ACM on Management of Data, February 2025, ACM (Association for Computing Machinery),
DOI: 10.1145/3709676.
You can read the full text:

Read

Contributors

The following have contributed to this page