Database cracking is a method to create partial indices as a side-effect of processing queries. Cracking efficiently smears out the cost of creating a full index over a stream of queries, creating an index that is overfitted to queried parts of the data. This core characteristic of cracking leads to unpredictable performance and unreliable convergence towards a full index. These problems are aggravated when considering updates and multidimensional queries. We envision a new indexing technique, Progressive Indexing that improves database cracking by strictly limiting perquery indexing cost to a budget (e.g., a user-defined fraction of scan costs), allowing the first and subsequent queries to complete without heavy penalties. At the same time, all indexing effort is spent towards predictable convergence towards a full index. We discuss different algorithms to deal with multidimensional queries and updates while maintaining a robust and convergent index, we then explore the research space and new challenges that arise from this new technique.

CEUR Workshop
VLDB PhD Workshop
Centrum Wiskunde & Informatica, Amsterdam, The Netherlands

Holanda, P.T. (2018). Progressive indices: Indexing without prejudice. In Proceedings of the VLDB 2018 PhD Workshop co-located with the 44th International Conference on Very Large Databases (VLDB 2018).