SparkFuzz: Searching correctness regressions in modern query engines

Ghit, Bogdan; Poggi, Nicolas; Rosen, Josh; Xin, Reynold; Boncz, Peter

doi:10.1145/3395032.3395327

B. Ghit (Bogdan), N. Poggi (Nicolas), J. Rosen (Josh), R. Xin (Reynold) and P.A. Boncz (Peter)

2020-06-19

SparkFuzz: Searching correctness regressions in modern query engines

Presented at the Workshop on Testing Database Systems, DBTest 2020 (June 2020), Portland

With more than 1200 contributors, Apache Spark is one of the most actively developed open source projects. At this scale and pace of development, mistakes are bound to happen. In this paper we present SparkFuzz, a toolkit we developed at Databricks for uncovering correctness errors in the Spark SQL engine. To guard the system against correctness errors, SparkFuzz takes a fuzzing approach to testing by generating random data and queries. Spark-Fuzz executes the generated queries on a reference database system such as PostgreSQL which is then used as a test oracle to verify the results returned by Spark SQL. We explain the approach we take to data and query generation and we analyze the coverage of SparkFuzz. We show that SparkFuzz achieves its current maximum coverage relatively fast by generating a small number of queries.

Additional Metadata
Persistent URL	doi.org/10.1145/3395032.3395327
Conference	Workshop on Testing Database Systems, DBTest 2020
Organisation	Centrum Wiskunde & Informatica, Amsterdam (CWI), The Netherlands
Citation APA Style AAA Style APA Style Cell Style Chicago Style Harvard Style IEEE Style MLA Style Nature Style Vancouver Style American-Institute-of-Physics Style Council-of-Science-Editors Style BibTex Format Endnote Format RIS Format CSL Format DOIs only Format	Ghit, B., Poggi, N., Rosen, J., Xin, R., & Boncz, P. (2020). SparkFuzz: Searching correctness regressions in modern query engines. In Proceedings of the Workshop on Testing Database Systems, DBTest 2020 (pp. 1–6). doi:10.1145/3395032.3395327

View at Publisher

Free Full Text ( Final Version , 648kb )

SparkFuzz: Searching correctness regressions in modern query engines

Publication

Publication

Address

CWI researchers

Questions or comments?

SparkFuzz: Searching correctness regressions in modern query engines

Publication

Publication

Workflow

Workflow

Add Content