Analytical database systems offer high-performance in-memory aggregation. If there are many unique groups, tem- porary query intermediates may not fit RAM, requiring the use of external storage. However, switching from an in-memory to an external algorithm can degrade performance sharply. We revisit external hash aggregation on modern hardware, aiming instead for robust performance that avoids a “perfor- mance cliff” when memory runs out. To achieve this, we introduce two techniques for handling temporary query intermediates. First, we propose unifying the memory management of temporary and persistent data. Second, we propose using a page layout that can be spilled to disk despite being optimized for main memory performance. These two techniques allow operator implementations to process larger- than-memory query intermediates with only minor modifications. We integrate these into DuckDB’s parallel hash aggregation. Experimental results show that our implementation gracefully de- grades performance as query intermediates exceed the available memory limit, while main memory performance is competitive with other analytical database systems.

, ,
Database Architectures

Kuiper, L., Boncz, P., & Mühleisen, H. (2024). Robust external hash aggregation in the solid state age.