Recycling in Pipelined Query Evaluation
Presented at the IEEE International Conference on Data Engineering, Brisbane
Database systems typically execute queries in isolation. Sharing recurring intermediate and final results between successive query invocations is ignored or only exploited by caching final query results. The DBA is kept in the loop to make explicit sharing decisions by identifying and/or defining materialized views. Thus decisions are made only after a long time and sharing opportunities may be missed. Recycling intermediate results has been proposed as a method to make database query engines profit from opportunities to reuse fine-grained partial query results, that is fully autonomous and is able to continuously adapt to changes in the workload. The technique was recently revisited in the context of MonetDB, a system that by default materializes all intermediate results. Materializing intermediate results can consume significant system resources, therefore most other database systems avoid this where possible, following a pipelined query architecture instead. The novelty of this paper is to show how recycling can successfully be applied in pipelined query executors, by tracking the benefit of materializing possible intermediate results and then choosing the ones making best use of a limited intermediate result cache. We present ways to maximize the potential of recycling by leveraging subsumption and proactive query rewriting. We have implemented our approach in the Vectorwise database engine and have experimentally evaluated its potential using both synthetic and real-world datasets. Our results show that intermediate result recycling significantly improves performance.