[Demo] Low-latency spark queries on updatable data

Uta, Alexandru; Ghit, Bogdan; Dave, Ankur; Boncz, Peter

doi:10.1145/3299869.3320227

A. Uta (Alexandru), B. Ghit (Bogdan), A. Dave (Ankur) and P.A. Boncz (Peter)

2019-06-30

[Demo] Low-latency spark queries on updatable data

Presented at the ACM SIGMOD International Conference on Management of Data (June 2019), Amsterdam, The Netherlands

As data science gets deployed more and more into operational applications, it becomes important for data science frameworks to be able to perform computations in interactive, sub-second time. Indexing and caching are two key techniques that can make interactive query processing on large datasets possible. In this demo, we show the design, implementation and performance of a new indexing abstraction in Apache Spark, called the Indexed DataFrame. This is a cached DataFrame that incorporates an index to support fast lookup and join operations, and supports updates with multi-version concurrency. We demonstrate the Indexed Dataframe on a social network dataset using microbench-marks and real-world graph processing queries, in datasets that are continuously growing.

Additional Metadata
Persistent URL	doi.org/10.1145/3299869.3320227
Conference	ACM SIGMOD International Conference on Management of Data
Organisation	Centrum Wiskunde & Informatica, Amsterdam (CWI), The Netherlands
Citation APA Style AAA Style APA Style Cell Style Chicago Style Harvard Style IEEE Style MLA Style Nature Style Vancouver Style American-Institute-of-Physics Style Council-of-Science-Editors Style BibTex Format Endnote Format RIS Format CSL Format DOIs only Format	Uta, A., Ghit, B., Dave, A., & Boncz, P. (2019). [Demo] Low-latency spark queries on updatable data. In Proceedings of the ACM International Conference on Management of Data (SIGMOD) (pp. 2009–2012). doi:10.1145/3299869.3320227

View at Publisher

Full Text ( Final Version , 950kb )

[Demo] Low-latency spark queries on updatable data

Publication

Publication

Address

CWI researchers

Questions or comments?

[Demo] Low-latency spark queries on updatable data

Publication

Publication

Workflow

Workflow

Add Content