The Researcher’s Guide to the Data Deluge: Querying a Scientific Database in Just a Few Seconds

Kersten, Martin; Idreos, Stratos; Manegold, Stefan; Liarou, Erietta

M.L. Kersten (Martin), S. Idreos (Stratos), S. Manegold (Stefan) and E. Liarou (Erietta)

2011

The Researcher’s Guide to the Data Deluge: Querying a Scientific Database in Just a Few Seconds

Presented at the International Conference on Very Large Databases, Seattle, WA, USA

There is a clear need nowadays for extremely large data processing. This is especially true in the area of scientific data management where soon we expect data inputs in the order of multiple Petabytes. However, current data management technology is not suitable for such data sizes. In the light of such new database applications, we can rethink some of the strict requirements database systems adopted in the past. We argue that correctness is such a critical property, responsible for performance degradation. In this paper, we propose a new paradigm towards building database kernels that may produce \emph{wrong but fast, cheap and indicative} results. Fast response times is an essential component of data analysis for exploratory applications; allowing for fast queries enables the user to develop a ``feeling" for the data through a series of ``painless" queries which eventually leads to more detailed analysis in a targeted data area. We propose a research path where a database kernel autonomously and on-the-fly decides to reduce the processing requirements of a running query based on workload, hardware and environmental parameters. It requires a complete redesign of database operators and query processing strategy. For example, typical and very common scenarios were query processing performance degrades significantly are cases where a database operator has to spill data to disk, or is forced to perform random access, or has to follow long linked lists, etc. Here we ask the question: What if we simply avoid these steps, ``ignoring" the side-effect in the correctness of the result?

Additional Metadata
THEME	Information (theme 2)
Conference	International Conference on Very Large Databases
Note	Challenges & Visions Track Best Paper Award.
Organisation	Database Architectures
Citation APA Style AAA Style APA Style Cell Style Chicago Style Harvard Style IEEE Style MLA Style Nature Style Vancouver Style American-Institute-of-Physics Style Council-of-Science-Editors Style BibTex Format Endnote Format RIS Format CSL Format DOIs only Format	Kersten, M., Idreos, S., Manegold, S., & Liarou, E. (2011). The Researcher’s Guide to the Data Deluge: Querying a Scientific Database in Just a Few Seconds. In Proceedings of International Conference on Very Large Data Bases 2011 (VLDB) (pp. 585–597).

Free Full Text ( Author Manuscript , 129kb )

The Researcher’s Guide to the Data Deluge: Querying a Scientific Database in Just a Few Seconds

Publication

Publication

Address

CWI researchers

Questions or comments?

The Researcher’s Guide to the Data Deluge: Querying a Scientific Database in Just a Few Seconds

Publication

Publication

Workflow

Workflow

Add Content