White-box compression: Learning and exploiting compact table representations

Ghita, Bogdan; Gomes Tomé, Diego; Boncz, Peter

B. Ghita (Bogdan), D. Gomes Tomé (Diego) and P.A. Boncz (Peter)

2020-01-12

White-box compression: Learning and exploiting compact table representations

Presented at the Biennial Conference on Innovative Data Systems Research (January 2020), Amsterdam, The Netherlands

We formulate a conceptual model for white-box compression, which represents the logical columns in tabular data as an openly deﬁned function over some actually stored physical columns. Each block of data should thus go accompanied by a header that describes this functional mapping. Because these compression functions are openly deﬁned, database systems can exploit them using query optimization and during execution, enabling e.g. better ﬁlter predicate pushdown. In addition, we show that white-box compression is able to identify a broad variety of new opportunities for compression, leading to much better compression factors. These opportunities are identiﬁed using an automatic learning process that learns the functions from the data. We provide a recursive pattern-driven algorithm for such learning. Finally, we demonstrate the effectiveness of white-box compression on a new benchmark we contribute hereby: the Public BI benchmark provides a rich set of real-world datasets.

We believe our basic prototype for white-box compression opens the way for future research into transparent compressed data representations on the one hand and database system architectures that can eﬃciently exploit these on the other, and should be seen as another step into the direction of data management systems that are self-learning and optimize themselves for the data they are deployed on.

Additional Metadata
Conference	Biennial Conference on Innovative Data Systems Research
Organisation	Database Architectures
Citation APA APA Style APA-ALL Style AAA Style Cell Style Chicago Style Harvard Style IEEE Style MLA Style Nature Style Vancouver Style American-Institute-of-Physics Style Council-of-Science-Editors Style BibTex Format Endnote Format RIS Format CSL Format DOIs only Format	Ghita, B., Gomes Tomé, D.& Boncz, P. (2020, January 12). White-box compression: Learning and exploiting compact table representations. Proceedings of the Conference on Innovative Data Systems Research.

Free Full Text ( Final Version , 442kb )

Additional Files
View online Final Version

See Also
software\|data Public BI benchmark B. Ghita (Bogdan), S. Manegold (Stefan) and P.A. Boncz (Peter)

White-box compression: Learning and exploiting compact table representations

Publication

Publication

software|data
Public BI benchmark

Address

CWI researchers

Questions or comments?

White-box compression: Learning and exploiting compact table representations

Publication

Publication

software|data Public BI benchmark

Workflow

Workflow

Add Content

software|data
Public BI benchmark