NUMA obliviousness through memory mapping
Presented at the DAMON Workshop (colocated with ACM SIGMOD), Melbourne
With the rise of multi-socket multi-core CPUs a lot of effort is being put into how to best exploit their abundant CPU power. In a shared memory setting the multi-socket CPUs are equipped with their own memory module, and access memory modules across sockets in a non-uniform access pattern (NUMA). Memory access across socket is relatively expensive compared to memory access within a socket. One of the common solutions to minimize across socket memory access is to partition the data, such that the data affinity is maintained per socket. In this paper we explore the role of memory mapped storage to provide transparent data access in a NUMA environment, without the need of explicit data partitioning. We compare the performance of a database engine in a distributed setting in a multi-socket environment, with a database engine in a NUMA oblivious setting. We show that though the operating system tries to keep the data affinity to local sockets, a significant remote memory access still occurs, as the number of threads increase. Hence, setting explicit process and memory affinity results into a robust execution in NUMA oblivious plans. We use micro-experiments and SQL queries from the TPC-H benchmark to provide an in-depth experimental exploration of the landscape, in a four socket Intel machine.
|ACM SIGMOD Record|
|Commit: Time Trails (P019)|
|DAMON Workshop (colocated with ACM SIGMOD)|
Gawade, M.M, & Kersten, M.L. (2015). NUMA obliviousness through memory mapping. In ACM SIGMOD Record. ACM. doi:10.1145/2771937.2771948