Statistical analysts have long been struggling with evergrowing data volumes. While specialized data management systems such as relational databases would be able to handle the data, statistical analysis tools are far more convenient to express complex data analyses. An integration of these two classes of systems has the potential to overcome the data management issue while at the same time keeping analysis convenient. However, one must keep a careful eye on implementation overheads such as serialization. In this paper, we propose the in-process integration of data management and analytical tools. Furthermore, we argue that a zero-copy integration is feasible due to the omnipresence of C-style arrays containing native types. We discuss the general concept and present a prototype of this integration based on the columnar relational database MonetDB and the R environment for statistical computing. We evaluate the performance of this prototype in a series of micro-benchmarks of common data management tasks.

C.S. Jensen , H. Lu , T.B. Pedersen (Torben Bach) , C. Thomsen , K. Torp
Commit: Time Trails (P019)
International Conference on Scientific and Statistical Database Management
Database Architectures

Lajus, J., & Mühleisen, H. (2014). Efficient Data Management and Statistics with Zero-Copy Integration. In C. S. Jensen, H. Lu, T. B. Pedersen, C. Thomsen, & K. Torp (Eds.), . doi:10.1145/2618243.2618265