Modern high-performance analytical query engines follow one of two execution paradigms. Vectorized engines implement an interpreter for relational algebra operators that operates on batches of tuples to maximize performance. Compiling engines, on the other hand, generate optimized and specialized code for every query. This paper unifies these two approaches. We present Incremental Fusion, a novel execution paradigm for modern, high-performance query engines. An Incremental Fusion engine performs operator-fusing code generation -with a twist: The compiling engine generates its own vectorized interpreter. The engine uses a finite set of building blocks below relational algebra for code generation. It can enumerate each building block and generate a vectorized primitive for it. The vectorized interpreter becomes a free byproduct of carefully choosing the right abstraction for code generation. This allows an Incremental Fusion engine to dynamically switch between vectorized interpretation and operator-fusing code generation. We demonstrate Incremental Fusion in our open-source prototype engine InkFuse. We measure InkFuse against the state-of-the-art vectorized and compiling engines DuckDB and Umbra. InkFuse is able to achieve competitive performance both for low-latency processing, and compute-intensive long-running queries.

, , , , ,
doi.org/10.1109/ICDE60146.2024.00042
40th IEEE International Conference on Data Engineering, ICDE 2024
Centrum Wiskunde & Informatica, Amsterdam (CWI), The Netherlands

Wagner, B., Kohn, A., Boncz, P., & Leis, V. (2024). Incremental Fusion: Unifying compiled and vectorized query execution. In Proceedings of the International Conference on Data Engineering (pp. 462–474). doi:10.1109/ICDE60146.2024.00042