Global-scale grids provide a massive source of processing power, providing the means to support processor intensive parallel applications. The strong burstiness and unpredictability of the available processing and network resources raise the strong need to make applications robust against the dynamics of grid environments. The two main techniques that are most suitable to cope with the dynamic nature of the grid are Dynamic Load Balancing (DLB) and job replication (JR). In this paper, we analyze and compare the effectiveness of these two approaches by means of trace-driven simulations. We observe that there exists an easy-to-measure statistic Y and a corresponding threshold value Y*, such that DLB consistently outperforms JR when Y > Y*, whereas the reverse is true for Y < Y*. Based on this observation, we propose a simple and easy-to-implement approach, throughout referred to as the DLB/JR method, that can make dynamic decisions about whether to use DLB or JR. Extensive simulations based on a large set of real data monitored in a global-scale grid show that our DLB/JR method consistently performs at least as good as both DLB and JR in all circumstances, which makes our DLB/JR method highly robust against the unpredictable nature of global-scale grids.

, , ,
IEEE Transactions on Parallel and Distributed Systems
Centrum Wiskunde & Informatica, Amsterdam (CWI), The Netherlands

Dobber, M.A, Koole, G.M, & van der Mei, R.D. (2009). Dynamic Load Balancing and Job Replication in a Global-Scale Grid Environment: A Comparison. IEEE Transactions on Parallel and Distributed Systems, 20(2), 207–218. doi:10.1109/TPDS.2008.61