Sorting is one of the most well-studied problems in computer science and a vital operation for relational database systems. Despite this, little research has been published on implementing an efficient relational sorting operator. In this work, we explore the design space of sorting in a relational database system. We use micro-benchmarks to explore how to sort relational data efficiently in analytical database systems, taking into account different query execution engines as well as row and columnar data formats. We show that, regardless of architectural differences between query engines, sorting rows is almost always more efficient than sorting columnar data, even if this requires converting the data from columns to rows and back. Sorting rows efficiently is challenging for systems with an interpreted execution engine, as their implementation has to stay generic. We show that these challenges can be overcome with several existing techniques. Based on our findings, we implement a highly optimized row-based sorting approach in the DuckDB open-source in-process analytical database management system, which has a vectorized interpreted query engine. We compare DuckDB with four analytical database systems and find that DuckDB's sort implementation outperforms query engines that sort using a columnar data format.

, ,
2023 IEEE 39th International Conference on Data Engineering (ICDE)
Centrum Wiskunde & Informatica, Amsterdam (CWI), The Netherlands

Kuiper, L.N, & Mühleisen, H.F. (2023). These rows are made for sorting and that's just what we'll do. In Proceedings of the IEEE International Conference on Data Engineering (ICDE) (pp. 2050–2062). doi:10.1109/ICDE55515.2023.00159