Lazy ETL in Action: ETL Technology Dates Scientific Data
Both scientific data and business data have analytical needs. Analysis takes place after a scientific data warehouse is eagerly filled with all data from external data sources (repositories). This is similar to the initial loading stage of Extract, Transform, and Load (ETL) processes that drive business intelligence. ETL can also help scientific data analysis. However, the initial loading is a time and resource consuming operation. It might not be entirely necessary, e.g. if the user is interested in only a subset of the data. We propose to demonstrate Lazy ETL, a technique to lower costs for initial loading. With it, ETL is integrated into the query processing of the scientific data warehouse. For a query, only the required data items are extracted, transformed, and loaded transparently on-the-fly. The demo is built around concrete implementations of Lazy ETL for seismic data analysis. The seismic data warehouse is ready for query processing, without waiting for long initial loading. The audience fires analytical queries to observe the internal mechanisms and modifications that realize each of the steps; lazy extraction, transformation, and loading.
|Keywords||lazy ETL, scientific data analytics, scientific file repository management and exploration|
|THEME||Information (theme 2)|
|Project||Data Management, Integration and Knowledge Discovery,for Earth Observation Applications|
|Conference||International Conference on Very Large Databases|
Kargin, Y, Ivanova, M.G, Manegold, S, Kersten, M.L, & Zhang, Y. (2013). Lazy ETL in Action: ETL Technology Dates Scientific Data. In Proceedings of International Conference on Very Large Data Bases 2013 (VLDB 39).