The rise of Data Science has caused an influx of new usersin need of data management solutions. However, insteadof utilizing existing RDBMS solutions they are opting touse a stack of independent solutions for data storage andprocessing glued together by scripting languages. This is notbecause they do not need the functionality that an integratedRDBMS provides, but rather because existing RDBMS im-plementations do not cater to their use case. To solve theseissues, we propose a new class of data management systems:embedded analytical systems. These systems are tightlyintegrated with analytical tools, and provide fast and effi-cient access to the data stored within them. In this work,we describe the unique challenges and opportunities w.r.tworkloads, resilience and cooperation that are faced by thisnew class of systems and the steps we have taken towardsaddressing them in the DuckDB system.

Biennial Conference on Innovative Data Systems Research
Database Architectures

Raasveldt, M., & Mühleisen, H. (2020). Data Management for Data Science - Towards Embedded Analytics. In Proceedings of the Conference on Innovative Data Systems Research.