2010-03-01
Estimating the compression fraction of an index using sampling
Publication
Publication
Data compression techniques such as null suppression and dictionary compression are commonly used in today’s database systems. In order to effectively leverage compression, it is necessary to have the ability to efficiently and accurately estimate the size of an index if it were to be compressed. Such an analysis is critical if automated physical design tools are to be extended to handle compression. Several database systems today provide estimators for this problem based on random sampling. While this approach is efficient, there is no previous work that analyses its accuracy. In this paper, we analyse the problem of estimating the compressed size of an index from the point of view of worst-case guarantees. We show that the simple estimator implemented by several database systems has several “good” cases even though the estimator itself is agnostic to the internals of the specific compression algorithm. efficiently. The naïve method of actually building and compressing the index in order to estimate its size, while highly accurate is prohibitively inefficient. Thus, we need to be able to accurately estimate the compressed size of an index without incurring the cost of actually compressing it. This problem is challenging because the size of the compressed index can depend significantly on the data distribution as well as the compression technique used. This is in contrast with the estimation of the size of an uncompressed index in physical database design tools which can be derived in a straightforward manner from the schema (which defines the size of the corresponding column) and the number of rows in the table.
Additional Metadata | |
---|---|
IEEE | |
Cracking a Scientific Database | |
IEEE International Conference on Data Engineering | |
Organisation | Database Architectures |
Idreos, S., Kaushik, R., Narasayya, V., & Ramamurthy, R. (2010). Estimating the compression fraction of an index using sampling. In Proceedings of IEEE International Conference on Data Engineering 2010 (26). IEEE. |