Statistical analysts have long been struggling with evergrowing data volumes. While specialized data management systems such as relational databases would be able to handle the data, statistical analysis tools are far more convenient to express complex data analyses. An integration of these two classes of systems has the potential to overcome the data management issue while at the same time keeping analysis convenient. However, one must keep a careful eye on implementation overheads such as serialization. In this paper, we propose the in-process integration of data management and analytical tools. Furthermore, we argue that a zero-copy integration is feasible due to the omnipresence of C-style arrays containing native types. We discuss the general concept and present a prototype of this integration based on the columnar relational database MonetDB and the R environment for statistical computing. We evaluate the performance of this prototype in a series of micro-benchmarks of common data management tasks.

Additional Metadata
ACM Database Applications (acm H.2.8)
THEME Information (theme 2)
Stakeholder Unspecified
Editor C.S. Jensen , H. Lu , T.B. Pedersen (Torben Bach) , C. Thomsen , K. Torp
Persistent URL dx.doi.org/10.1145/2618243.2618265
Project Commit: Time Trails (P019)
Conference International Conference on Scientific and Statistical Database Management
Citation
Lajus, J, & Mühleisen, H.F. (2014). Efficient Data Management and Statistics with Zero-Copy Integration. In C.S Jensen, H Lu, T.B Pedersen, C Thomsen, & K Torp (Eds.), . doi:10.1145/2618243.2618265