Advances in Large-Scale RDF Data Management

Boncz, Peter; Erling, Orri; Pham, Minh-Duc

doi:10.1007/978-3-319-09846-3_2

One of the prime goals of the LOD2 project is improving the performance and scalability of RDF storage solutions so that the increasing amount of Linked Open Data (LOD) can be efficiently managed. Virtuoso has been chosen as the basic RDF store for the LOD2 project, and during the project it has been significantly improved by incorporating advanced relational database techniques from MonetDB and Vectorwise, turning it into a compressed column store with vectored execution. This has reduced the performance gap (“RDF tax”) between Virtuoso’s SQL and SPARQL query performance in a way that still respects the “schema-last” nature of RDF. However, by lacking schema information, RDF database systems such as Virtuoso still cannot use advanced relational storage optimizations such as table partitioning or clustered indexes and have to execute SPARQL queries with many self-joins to a triple table, which leads to more join effort than needed in SQL systems. In this chapter, we first discuss the new column store techniques applied to Virtuoso, the enhancements in its cluster parallel version, and show its performance using the popular BSBM benchmark at the unsurpassed scale of 150 billion triples. We finally describe ongoing work in deriving an “emergent” relational schema from RDF data, which can help to close the performance gap between relational-based and RDF-based storage solutions.

Additional Metadata
Keywords	RDF, Data Management, Linked Data
THEME	Information (theme 2)
Stakeholder	Unspecified
Publisher	Springer
Persistent URL	doi.org/10.1007/978-3-319-09846-3_2
Project	LOD2 - Creating Knowledge out of Interlinked Data
Organisation	Database Architectures
Citation APA Style AAA Style APA Style Cell Style Chicago Style Harvard Style IEEE Style MLA Style Nature Style Vancouver Style American-Institute-of-Physics Style Council-of-Science-Editors Style BibTex Format Endnote Format RIS Format CSL Format DOIs only Format	Boncz, P., Erling, O., & Pham, M.-D. (2014). Advances in Large-Scale RDF Data Management. In Creating Knowledge Out of Interlinked Data. Springer. doi:10.1007/978-3-319-09846-3_2

View at Publisher

Free Full Text ( Final Version )

Additional Files
22643B.pdf Author Manuscript , 467kb
Publisher Version

Advances in Large-Scale RDF Data Management

Publication

Publication

Address

CWI researchers

Questions or comments?

Advances in Large-Scale RDF Data Management

Publication

Publication

Workflow

Workflow

Add Content