Indexed Foreign-Key Joins expose a very asymmetric access pattern: the Foreign-Key Index is sequentially scanned whilst the Primary-Key table is target of many quasi-random lookups which is the dominant cost factor. To reduce the costs of the random lookups the fact-table can be (re-) partitioned at runtime to increase access locality on the dimension table, and thus limit the random memory access to inside the CPU's cache. However, this is very hard to optimize and the performance impact on recent architectures is limited because the partitioning costs consume most of the achievable join improvement. GPGPUs on the other hand have an architecture that is well suited for this operation: a relatively slow connection to the large system memory and a very fast connection to the smaller internal device memory. We show how to accelerate Foreign-Key Joins by executing the random table lookups on the GPU's VRAM while sequentially streaming the Foreign- Key-Index through the PCI-E Bus. We also experimentally study the memory access costs on GPU and CPU to provide estimations of the bene fit of this technique.

Data Management, Integration and Knowledge Discovery,for Earth Observation Applications
VLDB Workshop on Accelerating Data Management Systems Using Modern Processor and Storage Architectures
Database Architectures

Pirk, H., Manegold, S., & Kersten, M. (2011). Accelerating Foreign-Key Joins using Asymmetric Memory Channels. In Proceedings of International Conference on Very Large Data Bases 2011 (VLDB) (pp. 585–597).