We introduce JCC-H, a drop-in replacement for the data and query generator of TPC-H, that introduces Join-Crossing-Correlations (JCC) and skew into its dataset and query workload. These correlations are carefully designed such that the filter predicates on table columns in the existing TPC-H queries now suddenly can have effects on the value-, frequency- and join-fan-out-distributions, experienced by operators in the query plan. The query generator of JCC-H is able to generate parameter bindings for the 22 query templates in two different equivalence classes: query templates that receive “normal” parameters do not experience skew and behave very similar to default TPC-H queries. Query templates expanded with the “skewed” parameters, though, experience strong join-crossing-correlations and skew in filter, aggregation and join operations. In this paper we discuss the goals of JCC-H, its detailed design, as well as show initial experiments on both a single-server and MPP database system, that confirm that our design goals were largely met. In all, JCC-H provides a convenient way for any system that is already testing with TPC-H to examine how the system can handle skew and correlations, so we hope the community can use it to make progress on issues like skew mitigation and detection and exploitation of join-crossing-correlations in query optimizers and data storage.

Lecture Notes in Computer Science/Lecture Notes in Artificial Intelligence
TPC Technology Conference on Performance Evaluation and Benchmarking
Centrum Wiskunde & Informatica, Amsterdam (CWI), The Netherlands

Boncz, P.A, Anatiotis, A.-C, & Kläbe, S. (2017). JCC-H: Adding Join Crossing Correlations with skew to TPC-H. In Performance Evaluation and Benchmarking for the Analytics Era (pp. 103–119). doi:10.1007/978-3-319-72401-0_8