Strings are prevalent in real-world data sets. They often occupy a large fraction of the data and are slow to process. In this work, we present Fast Static Symbol Table (FSST), a lightweight compression scheme for strings. On text data, FSST offers decompression and compression speed similar to or better than the best speed-optimized compression methods, such as LZ4, yet offers significantly better compression factors. Moreover, its use of a static symbol table allows random access to individual, compressed strings, enabling lazy decompression and query processing on compressed data. We believe these features will make FSST a valuable piece in the standard compression toolbox.

doi.org/10.14778/3407790.3407851
46th International Conference on Very Large Data Bases
Database Architectures

Boncz, P., Neumann, T., & Leis, V. (2020). FSST: Fast random access string compression. In Proceedings of the VLDB Endowment (pp. 2649–2661). doi:10.14778/3407790.3407851