Delta lake: high-performance ACID table storage over cloud object stores

Armbrust, Michael; Das, Tathagata; Sun, Liwen; Yavuz, Burak; Zhu, Shixiong; Murthy, Mukul; Torres, Joseph; van Hovell, Herman; Ionescu, Adrian; Łuszczak, A.; Switakowski, M.; Szafranski, Michal; Li, Xiao; Ueshin, Takuya; Mokthar, Mostafa; Boncz, Peter; Ghodsi, Ali; Paranjpye, Sameer; Senster, Pieter; Xin, Reynold; Zaharia, Matei

doi:10.14778/3415478.3415560

Cloud object stores such as Amazon S3 are some of the largest and most cost-effective storage systems on the planet, making them an attractive target to store large data warehouses and data lakes. Unfortunately, their implementation as key-value stores makes it difficult to achieve ACID transactions and high performance: metadata operations such as listing objects are expensive, and consistency guarantees are limited. In this paper, we present Delta Lake, an open source ACID table storage layer over cloud object stores initially developed at Databricks. Delta Lake uses a transaction log that is compacted into Apache Parquet format to provide ACID properties, time travel, and significantly faster metadata operations for large tabular datasets (e.g., the ability to quickly search billions of table partitions for those relevant to a query). It also leverages this design to provide high-level features such as automatic data layout optimization, upserts, caching, and audit logs. Delta Lake tables can be accessed from Apache Spark, Hive, Presto, Redshift and other systems. Delta Lake is deployed at thousands of Databricks customers that process exabytes of data per day, with the largest instances managing exabyte-scale datasets and billions of objects.

Additional Metadata
Persistent URL	doi.org/10.14778/3415478.3415560
Conference	VLDB 2020
Organisation	Database Architectures
Citation APA Style AAA Style APA Style Cell Style Chicago Style Harvard Style IEEE Style MLA Style Nature Style Vancouver Style American-Institute-of-Physics Style Council-of-Science-Editors Style BibTex Format Endnote Format RIS Format CSL Format DOIs only Format	Armbrust, M., Das, T., Sun, L., Yavuz, B., Zhu, S., Murthy, M., … Zaharia, M. (2020). Delta lake: high-performance ACID table storage over cloud object stores. In Proceedings of the VLDB Endowment (pp. 3411–3424). doi:10.14778/3415478.3415560

View at Publisher

Free Full Text ( Final Version , 417kb )

Delta lake: high-performance ACID table storage over cloud object stores

Publication

Publication

Address

CWI researchers

Questions or comments?

Delta lake: high-performance ACID table storage over cloud object stores

Publication

Publication

Workflow

Workflow

Add Content