Instant-on scientific data warehouses: Lazy ETL for data-intensive research

Kargin, Yagiz; Pirk, Holger; Ivanova, Milena; Manegold, Stefan; Kersten, Martin

Y. Kargin (Yagiz), H. Pirk (Holger), M.G. Ivanova (Milena), S. Manegold (Stefan) and M.L. Kersten (Martin)

2012-08-01

Instant-on scientific data warehouses: Lazy ETL for data-intensive research

Presented at the International Workshop on Business Intelligence for the Real Time Enterprise, Istanbul, Turkey

In the dawning era of data intensive research, scientific discovery deploys data analysis techniques similar to those that drive business intelligence. Similar to classical Extract, Transform and Load (ETL) processes, data is loaded entirely from external data sources (repositories) into a scientific data warehouse before it can be analyzed. This process is both, time and resource intensive and may not be entirely necessary if only a subset of the data is of interest to a particular user. To overcome this problem, we propose a novel technique to lower the costs for data loading: Lazy ETL. Data is extracted and loaded transparently on-the-fly only for the required data items. Extensive experiments demonstrate the significant reduction of the time from source data availability to query answer compared to state-of-the-art solutions. In addition to reducing the costs for bootstrapping a scientific data warehouse, our approach also reduces the costs for loading new incoming data.

Additional Metadata
ACM	Systems (acm H.2.4)
THEME	Information (theme 2)
Project	Commit: Time Trails (P019) , Data Management, Integration and Knowledge Discovery,for Earth Observation Applications , The SciLens Infrastructure for Data Intensive Research
Conference	International Workshop on Business Intelligence for the Real Time Enterprise
Organisation	Database Architectures
Citation APA APA Style APA-ALL Style AAA Style Cell Style Chicago Style Harvard Style IEEE Style MLA Style Nature Style Vancouver Style American-Institute-of-Physics Style Council-of-Science-Editors Style BibTex Format Endnote Format RIS Format CSL Format DOIs only Format	Kargin, Y., Pirk, H., Ivanova, M., Manegold, S.& Kersten, M. (2012, August). Instant-on scientific data warehouses: Lazy ETL for data-intensive research. Proceedings of International Workshop on Business Intelligence for the Real Time Enterprise 2012.

Free Full Text ( Author Manuscript , 575kb )

Instant-on scientific data warehouses: Lazy ETL for data-intensive research

Publication

Publication

Address

CWI researchers

Questions or comments?

Instant-on scientific data warehouses: Lazy ETL for data-intensive research

Publication

Publication

Workflow

Workflow

Add Content