ALP: Adaptive lossless floating-point compression

Afroozeh, Azim; Kuffo Rivero, Leonardo Xavier; Boncz, Peter

A. Afroozeh (Azim), L.X. Kuffo Rivero (Leonardo Xavier) and P.A. Boncz (Peter)

2024-06-09

ALP: Adaptive lossless floating-point compression

Presented at the ACM SIGMOD International Conference on Management of Data (June 2024), Santiago, Chile

IEEE 754 doubles do not exactly represent most real values, intro- ducing rounding errors in computations and [de]serialization to text. These rounding errors inhibit the use of existing lightweight compression schemes such as Delta and Frame Of Reference (FOR), but recently new schemes were proposed: Gorilla, Chimp128, Pseu- doDecimals (PDE), Elf and Patas. However, their compression ratios are not better than those of general-purpose compressors such as Zstd; while [de]compression is much slower than Delta and FOR. We propose and evaluate ALP, that significantly improves these previous schemes in both speed and compression ratio (Figure 1). We created ALP after carefully studying the datasets used to eval- uate the previous schemes. To obtain speed, ALP is designed to fit vectorized execution. This turned out to be key for also improv- ing the compression ratio, as we found in-vector commonalities to create compression opportunities. ALP is an adaptive scheme that uses a strongly enhanced version of PseudoDecimals [31 ] to losslessly encode doubles as integers if they originated as decimals, and otherwise uses vectorized compression of the doubles’ front bits. Its high speeds stem from our implementation in scalar code that auto-vectorizes, using building blocks provided by our Fast- Lanes library [ 6], and an efficient two-stage compression algorithm that first samples row-groups and then vectors.

Additional Metadata
Keywords	Lossless compression, Floating point compression, Lightweight compression, Vectorized execution, Columnar storage, Big data formats
Conference	ACM SIGMOD International Conference on Management of Data
Organisation	Database Architectures
Citation APA Style AAA Style APA Style Cell Style Chicago Style Harvard Style IEEE Style MLA Style Nature Style Vancouver Style American-Institute-of-Physics Style Council-of-Science-Editors Style BibTex Format Endnote Format RIS Format CSL Format DOIs only Format	Afroozeh, A., Kuffo Rivero, L. X., & Boncz, P. (2024). ALP: Adaptive lossless floating-point compression. In Proceedings of the ACM International Conference on Management of Data (SIGMOD).

Free Full Text ( Final Version , 1mb )

ALP: Adaptive lossless floating-point compression

Publication

Publication

Address

CWI researchers

Questions or comments?

ALP: Adaptive lossless floating-point compression

Publication

Publication

Workflow

Workflow

Add Content