Optimistically compressed Hash Tables & Strings in the USSR

Gubner, Tim; Leis, Viktor; Boncz, Peter

T.K. Gubner (Tim), V. Leis (Viktor) and P.A. Boncz (Peter)

2021-05-31

Optimistically compressed Hash Tables & Strings in the USSR

ACM SIGMOD Record , Volume 50 - Issue 1 p. 60- 67

Modern query engines rely heavily on hash tables for query processing. Overall query performance and memory footprint is often determined by how hash tables and the tuples within them are represented. In this work, we propose three complementary techniques to improve this representation: Domain-Guided Prefix Suppression bit-packs keys and values tightly to reduce hash table record width. Optimistic Splitting decomposes values (and operations on them) into (operations on) frequently- and infrequently-accessed value slices. By removing the infrequently-accessed value slices from the hash table record, it improves cache locality. The Unique Strings Self-aligned Region (USSR) accelerates handling frequently occurring strings, which are widespread in real-world data sets, by creating an on-the-fly dictionary of the most frequent strings. This allows executing many string operations with integer logic and reduces memory pressure. We integrated these techniques into Vectorwise. On the TPC-H benchmark, our approach reduces peak memory consumption by 2–4x and improves performance by up to 1.5x. On a real-world BI workload, we measured a 2x improvement in performance and in micro-benchmarks we observed speedups of up to 25x.

Additional Metadata
Journal	ACM SIGMOD Record
Remark	This is a minor revision of the paper entitled “Efficient Query Processing with Optimistically Compressed Hash Tables & Strings in the USSR”
Organisation	Database Architectures
Citation APA Style AAA Style APA Style Cell Style Chicago Style Harvard Style IEEE Style MLA Style Nature Style Vancouver Style American-Institute-of-Physics Style Council-of-Science-Editors Style BibTex Format Endnote Format RIS Format CSL Format DOIs only Format	Gubner, T., Leis, V., & Boncz, P. (2021). Optimistically compressed Hash Tables & Strings in the USSR. ACM SIGMOD Record, 50(1), 60–67.

Free Full Text ( Final Version , 377kb )

See Also
inProceedings Efficient query processing with Optimistically Compressed Hash Tables & Strings in the USSR T.K. Gubner (Tim), V. Leis (Viktor) and P.A. Boncz (Peter)
dataset Public BI benchmark P.A. Boncz (Peter)
software\|data Public BI benchmark B. Ghita (Bogdan), S. Manegold (Stefan) and P.A. Boncz (Peter)

Optimistically compressed Hash Tables & Strings in the USSR

Publication

Publication

inProceedings
Efficient query processing with Optimistically Compressed Hash Tables & Strings in the USSR

dataset
Public BI benchmark

software|data
Public BI benchmark

Address

CWI researchers

Questions or comments?

Optimistically compressed Hash Tables & Strings in the USSR

Publication

Publication

inProceedings Efficient query processing with Optimistically Compressed Hash Tables & Strings in the USSR

dataset Public BI benchmark

software|data Public BI benchmark

Workflow

Workflow

Add Content

inProceedings
Efficient query processing with Optimistically Compressed Hash Tables & Strings in the USSR

dataset
Public BI benchmark

software|data
Public BI benchmark