Cache Conscious Data Layouting for In-Memory Databases
Many applications with manually implemented data management exhibit a data storage pattern in which semantically related data items are stored closer in memory than unrelated data items. The strong sematic relationship between these data items commonly induces contemporary accesses to them. This is called the principle of data locality and has been recognized by hardware vendors. It is commonly exploited to improve the performance of hardware. General Purpose Database Management Systems (DBMSs), whose main goal is to simplify optimal data storage and processing, generally fall short of this claim because the usage pattern of the stored data cannot be anticipated when designing the system. The current interest in column oriented databases indicates that one strategy does not fit all applications. A DBMS that automatically adapts it’s storage strategy to the workload of the database promises a significant performance increase by maximizing the benefit of hardware optimizations that are based on the principle of data locality. This thesis gives an overview of optimizations that are based on the principle of data locality and the effect they have on the data access performance of applications. Based on the findings, a model is introduced that allows an estimation of the costs of data accesses based on the arrangement of the data in the main memory. This model is evaluated through a series of experiments and incorporated into an automatic layouting component for a DBMS. This layouting component allows the calculation of an analytically optimal storage layout. The performance benefits brought by this component are evaluated in an application benchmark.