Exploiting cost distributions for query optimization
Large-scale query optimization is, besides its practical relevance, a hard test case for optimization techniques. Since exact methods cannot be applied due to the combinatorial explosion of the search space, heuristics and probabilistic strategies have been deployed for more than a decade. However, the results achieved are subject to discussion as several performance-critical effects still lack a sound explanation. In this paper we show how the cost distribution within these search spaces can be exploited. We analyze costing techniques and their principles to assess the difficulty of optimizing queries. Our investigation points out distinct characteristics of the cost distribution caused by the tree structure of the query plans. Applying basic stochastics to those distributions shows that random plan generation qualifies as a highly effective optimization technique under the precondition that every possible query plan may be generated with a positive probability. We present a run-time optimized algorithm meeting this requirement. Our experiments confirm the analytical results and demonstrate the algorithm's efficiency.