A key assumption underlying query optimization schemes for parallel processing is that their cost models can deal with the multitude of effects encountered during the execution phase. Unfortunately, this is rarely the case and the optimal processing is only achieved in a few situations. In this paper we address the problem to achieve optimal processing under a pipelined execution strategy. The approach taken is based on a novel analytical framework---which establishes a formal treatment of both dataflow and processing environment---to validate execution strategies. The framework is based on the notion of $pi$-optimality which reflects an execution strategy's ability of ad-hoc resource utilization. $pi$-optimal strategies are insensitive to skew and provide a transparent interface to parallelism as they ensure a provable near-optimal exploitation of the processing environment. Finally, we discuss several strategies and present a $pi$-optimal execution strategy. Experiments carried out on an SMP verify our considerations: The new algorithm outperforms conventional pipelining execution substantially and is resistant against various kinds of skew.

Information Systems [INS]
Database Architectures

Manegold, S, Waas, F, & Kersten, M.L. (1998). On optimal pipeline processing in parallel query execution. Information Systems [INS]. CWI.