Interactive exploratory data analysis consists of workloads that are composed of filter-aggregate queries with highly selective filters [1]. Hence, their performance is dependent on how much data they can skip during their scans, with indexes being the most efficient technique for aggressive data-skipping. Progressive Indexes are the state-of-the-art on automatic index creation for interactive exploratory data analysis. These indexes are partially constructed during query execution, eventually refining to a full index. However, progressive indexes have been designed for static databases, while in exploratory data analysis updates - usually batch-appends of newly acquired data - are frequent. In this paper, we propose Progressive Mergesort, a novel merging technique to make Progressive Indexes cope with updates. Progressive Mergesort differs from other merging techniques for partial indexes as it incorporates the index budget strategy design from Progressive Indexing. It follows the same three principles as Progressive Indexes: (1) fast query execution, (2) high robustness,(3) guaranteed convergence. Our experimental evaluation demonstrates that Progressive Mergesort is capable of achieving a 2x speedup when merging updates and up to 3 orders of magnitude lower variance than the state of the art.

doi.org/10.5441/002/edbt.2021.55
Advances in Database Technology - 24th International Conference on Extending Database Technology, EDBT 2021
Centrum Wiskunde & Informatica, Amsterdam (CWI), The Netherlands

Holanda, P., & Manegold, S. (2021). Progressive Mergesort: Merging batches of appends into progressive indexes. In Proceedings of the International Conference on Extending Database Technology (pp. 481–486). doi:10.5441/002/edbt.2021.55