We propose a novel method to represent compressed bitmaps. Similarly to existing bitmap compression schemes, we exploit the compression potential of bitmaps populated with consecutive identical bits, i.e., 0-runs and 1-runs. But in contrast to prior work, our approach employs a binary tree structure to represent runs of various lengths. Leaf nodes in the upper tree levels thereby represent longer runs, and vice versa. The tree-based representation results in high compression ratios and enables efficient random access, which in turn allows for the fast intersection of bitmaps. Our experimental analysis with randomly generated bitmaps shows that our approach significantly improves over state-of-the-art compression techniques when bitmaps are dense and/or only barely clustered. Further, we evaluate our approach with real-world data sets, showing that our tree-encoded bitmaps can save up to one third of the space over existing techniques.

bitmap, compression, data structure, indexing, succinct
doi.org/10.1145/3318464.3380588
ACM SIGMOD International Conference on Management of Data
Centrum Wiskunde & Informatica, Amsterdam, The Netherlands

Lang, H, Beischl, A, Leis, V, Boncz, P.A, Neumann, T, & Kemper, A. (2020). Tree-Encoded Bitmaps. In Proceedings of the ACM SIGMOD International Conference on Management of Data (pp. 937–967). doi:10.1145/3318464.3380588