The aim of data exploration is to get acquainted with an unfamiliar database. Typically, explorers operate by trial and error: they submit a query, study the result, and refine their query subsequently. In this paper, we investigate how to help them understand their query results. In particular, we focus on medium to high dimension spaces: if the database contains dozens or hundreds of columns, which variables should they inspect? We propose to detect subspaces in which the users' selection is different from the rest of the database. From this idea, we built Ziggy, a tuple description engine. Ziggy can detect informative subspaces, and it can explain why it recommends them, with visualizations and natural language. It can cope with mixed data, missing values, and it penalizes redundancy. Our experiments reveal that it is up to an order of magnitude faster than state-of-the-art feature selection algorithms, at minimal accuracy costs.

Additional Metadata
Persistent URL dx.doi.org/10.1145/2949689.2949692
Project The SciLens-II Infrastructure, Big Data at work , Commit: Time Trails (P019)
Conference International Conference on Scientific and Statistical Database Management
Grant This work was funded by the The Netherlands Organisation for Scientific Research (NWO); grant id nwo/621.016.201 - The Scilens-II Infrastructure, Big Data at work
Citation
Sellam, T.H.J, & Kersten, M.L. (2016). Fast, explainable view detection to characterize exploration queries. doi:10.1145/2949689.2949692