Distributed processing commonly requires data spread across machines using a priori static or hash-based data allocation. In this paper, we explore an alternative approach that starts from a master node in control of the complete database, and a variable number of worker nodes for delegated query processing. Data is shipped just-in-time to the worker nodes using a need to know policy, and is being reused, if possible, in subsequent queries. A bidding mechanism among the workers yields a scheduling with the most efficient reuse of previously shipped data, minimizing the data transfer costs. Just-in-time data shipment allows our system to benefit from locally available idle resources to boost overall performance. The system is maintenance-free and allocation is fully transparent to users. Our experiments show that the proposed adaptive distributed architecture is a viable and flexible alternative for small scale MapReduce-type of settings.

Springer
Commit: Time Trails (P019) , Data Management, Integration and Knowledge Discovery,for Earth Observation Applications
East-European Conference on Advances in Databases and Information Systems
Database Architectures

Ivanova, M., Kersten, M., & Groffen, F. (2012). Just-in-time Data Distribution for Analytical Query Processing. In Proceedings of East-European Conference on Advances in Databases and Information Systems 2012 (16). Springer.