Progressive Indexes: Indexing for interactive data analysis
Interactive exploration of large volumes of data is increasingly common, as data scientists attempt to extract interesting information from large opaque data sets. This scenario presents a difficult challenge for traditional database systems, as (1) nothing is known about the query workload in advance, (2) the query workload is constantly changing, and (3) the system must provide interactive responses to the issued queries. This environment is challenging for index creation, as traditional database indexes require upfront creation, hence a priori workload knowledge, to be efficient.
In this paper, we introduce Progressive Indexing, a novel performance-driven indexing technique that focuses on automatic index creation while providing interactive response times to incoming queries. Its design allows queries to have a limited budget to spend on index creation. The indexing budget is automatically tuned to each query before query processing. This allows for systems to provide interactive answers to queries during index creation while being robust against various workload patterns and data distributions.
|Journal||Proceedings of the VLDB Endowment|
Holanda, P.T, Raasveldt, M, Manegold, S, & Mühleisen, H.F. (2019). Progressive Indexes: Indexing for interactive data analysis. Proceedings of the VLDB Endowment, 12(13), 2366–2378. doi:10.14778/3358701.3358705