2013-06-01
Column Imprints: A Secondary Index Structure
Publication
Publication
Presented at the
ACM SIGMOD International Conference on Management of Data, New York, NY
Large scale data warehouses rely heavily on secondary indexes,
such as bitmaps and b-trees, to limit access to slow IO devices.
However, with the advent of large main memory systems, cache
conscious secondary indexes are needed to improve also the transfer
bandwidth between memory and cpu. In this paper, we introduce
column imprint, a simple but efficient cache conscious secondary
index. A column imprint is a collection of many small bit
vectors, each indexing the data points of a single cacheline. An
imprint is used during query evaluation to limit data access and
thus minimize memory traffic. The compression for imprints is
cpu friendly and exploits the empirical observation that data often
exhibits local clustering or partial ordering as a side-effect of the
construction process. Most importantly, column imprint compression
remains effective and robust even in the case of unclustered
data, while other state-of-the-art solutions fail. We conducted an
extensive experimental evaluation to assess the applicability and
the performance impact of the column imprints. The storage overhead,
when experimenting with real world datasets, is just a few
percent over the size of the columns being indexed. The evaluation
time for over 40000 range queries of varying selectivity revealed
the efficiency of the proposed index compared to zonemaps and
bitmaps with WAH compression.
Additional Metadata | |
---|---|
ACM SIGMOD International Conference on Management of Data | |
Organisation | Database Architectures |
Sidirourgos, E., & Kersten, M. (2013). Column Imprints: A Secondary Index Structure. In Proceedings of ACM SIGMOD International Conference on Management of Data 2013 (SIGMOD ). |