2025-01-19
Adaptive factorization using linear-chained hash tables
Publication
Publication
We introduce factorized aggregations and worst-case optimal joins in DuckDB with an adaptive mechanism that only uses them when they enhance query performance. This builds on the adoption of a new hash table design (“Linear-Chained”) for equi-joins. Our first insight is that the collision-free chains of this new design enable efficient factorized and worst-case optimal processing. We further defer the decision to use factorization and worst-case optimal joins from optimization to runtime. Our second insight is that we can obtain accurate statistics, even if the join inputs lack these (e.g. because they are sub-queries or Parquet files), by leveraging runtime heuristics and constructing efficient on-the-fly sketches, during the hash join build. Finally, we show that machine learning models using these metrics can achieve close to optimal performance with a high accuracy. Furthermore, we propose heuristic-based approaches that offer comparable performance to these models, while relying on cheaper to obtain run-time statistics and being more explainable.
Additional Metadata | |
---|---|
The 15th Annual Conference on Innovative Data Systems Research (CIDR ’25) | |
Organisation | Database Architectures |
Gross, P., ten Wolde, D., & Boncz, P. (2025). Adaptive factorization using linear-chained hash tables. In Proceedings of the Conference on Innovative Data Systems Research. |