Query optimization through the looking glass, and what we found running the Join Order Benchmark

Leis, Viktor; Radke, Bernhard; Gubichev, Andrey; Mirchev, Atanas; Boncz, Peter; Kemper, Alfons; Neumann, Thomas

doi:10.1007/s00778-017-0480-7

V. Leis (Viktor), B. Radke (Bernhard), A. Gubichev (Andrey), A. Mirchev (Atanas), P.A. Boncz (Peter), A. Kemper (Alfons) and T. Neumann (Thomas)

2018-10-01

Query optimization through the looking glass, and what we found running the Join Order Benchmark

VLDB Journal , Volume 27 - Issue 5 p. 643- 668

Finding a good join order is crucial for query performance. In this paper, we introduce the Join Order Benchmark that works on real-life data riddled with correlations and introduces 113 complex join queries. We experimentally revisit the main components in the classic query optimizer architecture using a complex, real-world data set and realistic multi-join queries. For this purpose, we describe cardinality-estimate injection and extraction techniques that allow us to compare the cardinality estimators of multiple industrial SQL implementations on equal footing, and to characterize the value of having perfect cardinality estimates. Our investigation shows that all industrial-strength cardinality estimators routinely produce large errors: though cardinality estimation using table samples solves the problem for single-table queries, there are still no techniques in industrial systems that can deal accurately with join-crossing correlated query predicates. We further show that while estimates are essential for finding a good join order, query performance is unsatisfactory if the query engine relies too heavily on these estimates. Using another set of experiments that measure the impact of the cost model, we find that it has much less influence on query performance than the cardinality estimates. We investigate plan enumeration techniques comparing exhaustive dynamic programming with heuristic algorithms and find that exhaustive enumeration improves performance despite the suboptimal cardinality estimates. Finally, we extend our investigation from main-memory only, to also include disk-based query processing. Here, we find that though accurate cardinality estimation should be the first priority, other aspects such as modeling random versus sequential I/O are also important to predict query runtime.

Additional Metadata
Keywords	Query optimization, Join ordering, Cardinality estimation, Cost models
Persistent URL	doi.org/10.1007/s00778-017-0480-7
Journal	VLDB Journal
Organisation	Database Architectures
Citation APA Style AAA Style APA Style Cell Style Chicago Style Harvard Style IEEE Style MLA Style Nature Style Vancouver Style American-Institute-of-Physics Style Council-of-Science-Editors Style BibTex Format Endnote Format RIS Format CSL Format DOIs only Format	Leis, V., Radke, B., Gubichev, A., Mirchev, A., Boncz, P., Kemper, A., & Neumann, T. (2018). Query optimization through the looking glass, and what we found running the Join Order Benchmark. VLDB Journal, 27(5), 643–668. doi:10.1007/s00778-017-0480-7

View at Publisher

Full Text ( Final Version , 1mb )

Query optimization through the looking glass, and what we found running the Join Order Benchmark

Publication

Publication

Address

CWI researchers

Questions or comments?

Query optimization through the looking glass, and what we found running the Join Order Benchmark

Publication

Publication

Workflow

Workflow

Add Content