A spinning join that does not get dizzy
Presented at the International Conference on Distributed Computing Systems, Genoa, Italy
As network infrastructures with 10 Gb/s bandwidth and beyond have become pervasive and as cost advantages of large commodity-machine clusters continue to increase, research and industry strive to exploit the available processing performance for large-scale database processing tasks. In this work we look at the use of high-speed networks for distributed join processing. We propose Data Roundabout as alight weight transport layer that uses Remote Direct Memory Access (RDMA) to gain access to the throughput opportunities in modern networks. The essence of Data Roundabout is a ring shaped network in which each host stores one portion of a large database instance. We leverage the available bandwidth to (continuously) pump data through the high-speed network. Based on Data Roundabout, we demonstrate cyclo-join, which exploits the cycling flow of data to execute distributed joins. The study uses different join algorithms (hash join and sort-merge join) to expose the pitfalls and the advantages of each algorithm in the data cycling arena. The experiments show the potential of a large distributed main-memory cache glued together with RDMA into a novel distributed database architecture.
|IEEE Computer Society|
|International Conference on Distributed Computing Systems|
Frey, P.W, Pereira Goncalves, R.A, Kersten, M.L, & Teubner, J. (2010). A spinning join that does not get dizzy. In Proceedings of International Conference on Distributed Computing Systems 2010 (ICDCS 30) (pp. 283–292). IEEE Computer Society.