We show that compression can be a win-win for GPU data processing: it not only allows to store more data in GPU global memory, but can also accelerate data processing. We show that the complete redesign of compressed columnar storage in FastLanes, with its fully data-parallel bit-packing and encodings, also benefits GPU hardware. We micro-benchmark the performance of FastLanes on two GPU architectures (Nvidia T4 and V100) and integrate FastLanes in the Crystal GPU query processing prototype. Our experiments show that FastLanes decompression significantly outperforms previous decompression methods in micro-benchmarks, and can make end-to-end SSB queries up to twice faster compared to uncompressed query processing - in contrast to previous work where GPU decompression caused execution to slow down. We further discovered that an access granularity of decoding vectors of 1024 values is too large for a single GPU warp due to register pressure. We mitigate this here using mini-vectors - a future work question is how to further reduce this granularity with minimal impact on efficiency.

20th International Workshop on Data Management on New Hardware, DaMoN 2024
Centrum Wiskunde & Informatica, Amsterdam (CWI), The Netherlands

Afroozeh, A., Felius, L. (Lotte), & Boncz, P. (2024). Accelerating GPU Data Processing using FastLanes Compression. In 20th International Workshop on Data Management on New Hardware, DaMoN 2024. doi:10.1145/3662010.3663450