Cuckoo index: a lightweight secondary index structure

Kipf, Andreas; Chromejko, Damian; Hall, Alexander; Boncz, Peter; Andersen, David

doi:10.14778/3424573.3424577

A. Kipf (Andreas), D. Chromejko (Damian), A. Hall (Alexander), P.A. Boncz (Peter) and D.G. Andersen (David)

2020-10-27

Cuckoo index: a lightweight secondary index structure

In modern data warehousing, data skipping is essential for high query performance. While index structures such as B-trees or hash tables allow for precise pruning, their large storage requirements make them impractical for indexing secondary columns. Therefore, many systems rely on approximate indexes such as min/max sketches (ZoneMaps) or Bloom filters for cost-effective data pruning. For example, Google PowerDrill skips more than 90% of data on average using such indexes. In this paper, we introduce Cuckoo Index (CI), an approximate secondary index structure that represents the many-to-many relationship between keys and data partitions in a highly space-efficient way. At its core, CI associates variable-sized fingerprints in a Cuckoo filter with compressed bitmaps indicating qualifying partitions. With our approach, we target equality predicates in a read-only (immutable) setting and optimize for space efficiency under the premise of practical build and lookup performance. In contrast to per-partition (Bloom) filters, CI produces correct results for lookups with keys that occur in the data. CI allows to control the ratio of false positive partitions for lookups with non-occurring keys. Our experiments with real-world and synthetic data show that CI consumes significantly less space than per-partition filters for the same pruning power for low-to-medium cardinality columns. For high cardinality columns, CI is on par with its baselines.

Additional Metadata
Persistent URL	doi.org/10.14778/3424573.3424577
Organisation	Database Architectures
Citation APA Style AAA Style APA Style Cell Style Chicago Style Harvard Style IEEE Style MLA Style Nature Style Vancouver Style American-Institute-of-Physics Style Council-of-Science-Editors Style BibTex Format Endnote Format RIS Format CSL Format DOIs only Format	Kipf, A., Chromejko, D., Hall, A., Boncz, P., & Andersen, D. (2020). Cuckoo index: a lightweight secondary index structure. Proceedings of the VLDB Endowment, 3559–3572. doi:10.14778/3424573.3424577

View at Publisher

Free Full Text ( Final Version , 878kb )

Cuckoo index: a lightweight secondary index structure

Publication

Publication

Address

CWI researchers

Questions or comments?

Cuckoo index: a lightweight secondary index structure

Publication

Publication

Workflow

Workflow

Add Content