Reproducing TPC-DS qualification/reference results

TPC-DS ("Decision Support") is a standardized database benchmark. The benchmark toolkit contains reference ("qualification") queries and results. These are quite useful to check data management systems for bugs, as data, query and result are known and standardized.

However, as so often, reality is not so easy. The reference answers that ship with TPC-DS are quite broken, both from a syntactical and semantic perspective. In this repository, we present a cleaned and corrected set of TPC-DS queries and reference answers. We also include a tool (roundingdiff.py) that can automatically check sets of query answers against the reference in this repository, allowing some minor numerical drift.

Since there were serious issues with several reference results, we have fixed them by consensus (two ore more actual systems agreeing) where neccessary.

In addition, we have included results froms several popular data management systems for further reference, Oracle, SQL Server, DB2, PostgreSQL, HyPer and MonetDB. The plot below shows how these systems agree with the reference results in this repo. Some slides about the effort are available as well.

The qualification queries were created by replacing the qualification values from the TPC-DS spec in the query templates. In addition, we have reformulated the queries to be more compatible with a broad range of systems. Queries range from 1 to 99. The original query templates 14, 23, 24 and 39 contained two queries, they were split up in query 14a and 14b, etc. Where applicable, we use the "alternative" templates without the ROLLUP keyword. In addition, we have modified the queries in some places slightly to improve compatibility (e.g. rounding, || operator etc.).

Because databases cannot agree whether NULL values should be first or last in sorting order, there are two sets of reference results, answer_sets_nulls_first (Oracle, SQL Server and MonetDB) and answer_sets_nulls_last (DB2, PostgreSQL and HyPer).


Additional Files
View at GitHub