VectorH: taking SQL-on-Hadoop to the next level

Switakowski, M.; Costea, Andrei; Ionescu, Adrian; Raducanu, Bogdan; Bârca, C.; Sompolski, Juliusz; Łuszczak, A.; Szafranski, Michal; De Nijs, G.; Boncz, Peter

M. Switakowski, A. Costea (Andrei), A. Ionescu (Adrian), B. Raducanu (Bogdan), C. Bârca, J. Sompolski (Juliusz), A. Łuszczak, M. Szafranski (Michal), G. De Nijs and P.A. Boncz (Peter)

2016-06-01

VectorH: taking SQL-on-Hadoop to the next level

Presented at the ACM SIGMOD International Conference on Management of Data, San Francisco

In this paper we describe VectorH: a new SQL-on-Hadoop system built on top of the fast Vectorwise analytical database system. VectorH achieves fault tolerance and scalable data storage by relying on HDFS, extending the state-of-the-art in SQL-on-Hadoop systems by instrumenting the HDFS block replication policy to ensure local reads under most circumstances. VectorH integrates with YARN for workload management, achieving a high degree of elasticity . Even though HDFS is an append-only filesystem, and it supports ordered table storage, VectorH can accommodate trickle updates through Positional Delta Trees (PDTs), a differential update structure that can be queried efficiently. We describe the main technical extensions to single-server Vectorwise that turned it into a Hadoop-based MPP system, in terms of workload management, parallel query optimization and execution, HDFS storage, transaction processing and Spark integration. In the evaluation section we compare VectorH with HAWQ, Impala, SparkSQL and Hive, showing orders of magnitude better performance than these competitors.

Additional Metadata
Keywords	Data storage, Query optimization, Parallel query execution, Cluster computing, Hadoop
THEME	Information (theme 2)
Stakeholder	Actian Corp., Amsterdam, Netherlands
Project	Actian CWI Research Grant
Conference	ACM SIGMOD International Conference on Management of Data
Grant	This work was funded by the CWI PPS samenwerking; grant id pps/05050504 - Actian CWI Research Grant
Organisation	Database Architectures
Citation APA Style AAA Style APA Style Cell Style Chicago Style Harvard Style IEEE Style MLA Style Nature Style Vancouver Style American-Institute-of-Physics Style Council-of-Science-Editors Style BibTex Format Endnote Format RIS Format CSL Format DOIs only Format	Switakowski, M., Costea, A., Ionescu, A., Raducanu, B., Bârca, C., Sompolski, J., … Boncz, P. (2016). VectorH: taking SQL-on-Hadoop to the next level.

Full Text ( Final Version , 825kb )

Additional Files
24383B.pdf Author Manuscript , 844kb

VectorH: taking SQL-on-Hadoop to the next level

Publication

Publication

Address

CWI researchers

Questions or comments?

VectorH: taking SQL-on-Hadoop to the next level

Publication

Publication

Workflow

Workflow

Add Content