Nowadays scientists receive increasingly large volumes of data daily. These volumes and accompanying metadata that describes them are collected in scientific file repositories. Today's scientists need a data management tool that makes these file repositories accessible and performs a number of exploration steps near-instantly. Current database technology, however, has a long data-to-insight time, and does not provide enough interactivity to shorten the exploration time. We envision that exploiting metadata helps solving these problems. To this end, we propose a novel query execution paradigm, in which we decompose the query execution into two stages. During the first stage, we process only metadata, whereas the rest of the data is processed during the second stage. So that, we can exploit metadata to boost interactivity and to ingest only required data per query transparently. Preliminary experiments show that up-front ingestion time is reduced by orders of magnitude, while query performance remains similar. Motivated by these results, we identify the challenges on the way from the new paradigm to efficient interactive data exploration.

Additional Metadata
Keywords Two-stage Query Execution, Data Exploration, Scientific Data
ACM Systems (acm H.2.4), Database Applications (acm H.2.8), Systems (acm H.2.4)
THEME Information (theme 2)
Publisher ACM
Project Commit: Time Trails (P019)
Conference SIGMOD/PODS PhD Symposium
Citation
Kargin, Y. (2013). Turning Scientists into Data Explorers. In SIGMOD\'13 PhD Symposium Proceedings (pp. 25–30). ACM.