Modern columnar databases heavily use compression to reduce memory footprint and boost query execution. These techniques, however, are implemented as a”black box”, since their decompression logic is hard-coded and part of the table scan infrastructure. We proposed a novel compression model called White-box compression that views compression actions as functions over the physical columns stored in a block. Because these functions become visible as expressions in the query plan, many more optimizations can be made by the database system, boosting query execution speed. These functions are learnt from the data and also allow the data to be stored much more compactly, by decomposing string values, storing data in appropriate data-types automatically, and exploiting correlations between columns. White-box compression opens up a whole new set of research questions. We started with (1) How to learn white-box compression expressions (functions) from the data automatically? This Ph.D. research will subsequently study (2) How to leverage white-box compression with (run-time) query optimizations? (3) How can we integrate white-box compression in a query engine, if the white-box functions may be different for each block of data?

CEUR Workshop Proceedings
2020 International Conference on Very Large Databases PhD Workshop, VLDB-PhD 2020
Centrum Wiskunde & Informatica, Amsterdam (CWI), The Netherlands

Gomes Tomé, D, & Boncz, P.A. (2020). Redesigning query engines for white-box compression. In VLDB-PhD 2020 - VLDB 2020 PhD Workshop.