What is it about?
An algorithm is instance-optimal if it has the best possible (asymptotic) run time for every input instance, not just the worst case. TreeTracker Join is a new, instance-optimal join algorithm that runs in time O(|IN| + |OUT|), but has much lower overhead in practice compared to other optimal join algorithms.
Featured Image
Photo by Airam Dato-on on Unsplash
Why is it important?
With the rise of data lakes, database systems can no longer rely on the query optimizer for performance, as there is usually little to no statistics available in data lakes. Systems must therefore achieve robustness at the algorithmic level, and TreeTracker Join bridges the gap between theoretical optimality and practical efficiency.
Perspectives
My favorite part about the algorithm is its simplicity: it's almost exactly the same as the classic binary hash join, with just two simple twists. I hope this will make the algorithm easy to understand and implement.
Remy Wang
University of California Los Angeles
Read the Original
This page is a summary of: TreeTracker Join: Simple, Optimal, Fast, ACM Transactions on Database Systems, October 2025, ACM (Association for Computing Machinery),
DOI: 10.1145/3774325.
You can read the full text:
Contributors
The following have contributed to this page







