2015-04-01
The DBMS – your Big Data Sommelier
Publication
Publication
Presented at the
IEEE International Conference on Data Engineering, Seoul
When addressing the problem of ``big'' data volume, preparation costs are one of the key challenges: the high costs for loading, aggregating and indexing data leads to a long data-to-insight time. In addition to being a nuisance to the end-user, this latency prevents real-time analytics on "big'' data. Fortunately, data often comes in semantic chunks such as files that contain data items that share some characteristics such as acquisition time or location. A data management system that exploits this trait can significantly lower the data preparation costs and the associated data-to-insight time by only investing in the preparation of the relevant chunks. In this paper, we develop such a system as an extension of an existing relational DBMS (MonetDB). To this end, we develop a query processing paradigm and data storage model that are partial-loading aware. The result is a system that can make a 1.2 TB dataset (consisting of 4000 chunks) ready for querying in less than 3 minutes on a single server-class machine while maintaining good query processing performance.
Additional Metadata | |
---|---|
, , | |
IEEE | |
doi.org/10.1109/ICDE.2015.7113361 | |
Commit: Time Trails (P019) | |
IEEE International Conference on Data Engineering | |
Organisation | Database Architectures |
Kargin, Y., Kersten, M., Manegold, S., & Pirk, H. (2015). The DBMS – your Big Data Sommelier. In Proceedings of IEEE International Conference on Data Engineering 2015 (ICDE 31). IEEE. doi:10.1109/ICDE.2015.7113361 |