IEEE 754 doubles do not exactly represent most real values, intro- ducing rounding errors in computations and [de]serialization to text. These rounding errors inhibit the use of existing lightweight compression schemes such as Delta and Frame Of Reference (FOR), but recently new schemes were proposed: Gorilla, Chimp128, Pseu- doDecimals (PDE), Elf and Patas. However, their compression ratios are not better than those of general-purpose compressors such as Zstd; while [de]compression is much slower than Delta and FOR. We propose and evaluate ALP, that significantly improves these previous schemes in both speed and compression ratio (Figure 1). We created ALP after carefully studying the datasets used to eval- uate the previous schemes. To obtain speed, ALP is designed to fit vectorized execution. This turned out to be key for also improv- ing the compression ratio, as we found in-vector commonalities to create compression opportunities. ALP is an adaptive scheme that uses a strongly enhanced version of PseudoDecimals [31 ] to losslessly encode doubles as integers if they originated as decimals, and otherwise uses vectorized compression of the doubles’ front bits. Its high speeds stem from our implementation in scalar code that auto-vectorizes, using building blocks provided by our Fast- Lanes library [ 6], and an efficient two-stage compression algorithm that first samples row-groups and then vectors.

, , , , ,
ACM SIGMOD International Conference on Management of Data
Database Architectures

Afroozeh, A., Kuffo Rivero, L. X., & Boncz, P. (2024). ALP: Adaptive lossless floating-point compression. In Proceedings of the ACM International Conference on Management of Data (SIGMOD).