The FastLanes compression layout: Decoding >100 billion integers per second with scalar code

Afroozeh, Azim; Boncz, Peter

doi:10.14778/3598581.3598587

A. Afroozeh (Azim) and P.A. Boncz (Peter)

2023-07-10

The FastLanes compression layout: Decoding >100 billion integers per second with scalar code

Proceedings of the VLDB Endowment , Volume 16 - Issue 9 p. 2132- 2144

The open-source Fast Lanes project aims to improve big data formats, such as Parquet, ORC and columnar database formats, in multiple ways. In this paper, we significantly accelerate decoding of all common Light-Weight Compression (LWC) schemes: DICT, FOR, DELTA and RLE through better data-parallelism. We do so by re-designing the compression layout using two main ideas: (i) generalizing the value interleaving technique in the basic operation of bit-(un)packing by targeting a virtual 1024-bits SIMD register, (ii) reordering the tuples in all columns of a table in the same Unified Transposed Layout that puts tuple chunks in a common "104261537" order (explained in the paper); allowing for maximum independent work for all possible basic SIMD lane widths: 8, 16, 32, and 64 bits. We address the software development, maintenance and future proofness challenges of increasing hardware diversity, by defining a virtual 1024-bits instruction set that consists of simple operators supported by all SIMD dialects; and also, importantly, by scalar code. The interleaved and tuple-reordered layout actually makes scalar decoding faster, extracting more data-parallelism from today’s wide-issue CPUs. Importantly, the scalar version can be fully auto-vectorized by modern compilers, eliminating technical debt in software caused by platform-specific SIMD intrinsics. Micro-benchmarks on Intel, AMD, Apple and AWS CPUs show that Fast Lanes accelerates decoding by factors (decoding > 40 values per CPU cycle). Fast Lanes can make queries faster, as compressing the data reduces bandwidth needs, while decoding is almost free.

Additional Metadata
Persistent URL	doi.org/10.14778/3598581.3598587
Journal	Proceedings of the VLDB Endowment
Organisation	Centrum Wiskunde & Informatica, Amsterdam (CWI), The Netherlands
Citation APA Style AAA Style APA Style Cell Style Chicago Style Harvard Style IEEE Style MLA Style Nature Style Vancouver Style American-Institute-of-Physics Style Council-of-Science-Editors Style BibTex Format Endnote Format RIS Format CSL Format DOIs only Format	Afroozeh, A., & Boncz, P. (2023). The FastLanes compression layout: Decoding >100 billion integers per second with scalar code. Proceedings of the VLDB Endowment, 16(9), 2132–2144. doi:10.14778/3598581.3598587

View at Publisher

Free Full Text ( Final Version , 964kb )

See Also
techReport The FastLanes Compression Layout: Decoding >100 billion integers per second with scalar code A. Afroozeh (Azim) and P.A. Boncz (Peter)
software\|data cwida /FastLanes A. Afroozeh (Azim)

The FastLanes compression layout: Decoding >100 billion integers per second with scalar code

Publication

Publication

techReport
The FastLanes Compression Layout: Decoding >100 billion integers per second with scalar code

software|data
cwida /FastLanes

Address

CWI researchers

Questions or comments?

The FastLanes compression layout: Decoding >100 billion integers per second with scalar code

Publication

Publication

techReport The FastLanes Compression Layout: Decoding >100 billion integers per second with scalar code

software|data cwida /FastLanes

Workflow

Workflow

Add Content

techReport
The FastLanes Compression Layout: Decoding >100 billion integers per second with scalar code

software|data
cwida /FastLanes