Recent years have witnessed phenomenal growth in the application, and capabilities of Graphical Processing Units (GPUs) due to their high parallel computation power at relatively low cost. However, writing a computationally efficient GPU program (kernel) is challenging, and generally only certain specific kernel configurations lead to significant increases in performance. Auto-tuning is the process of automatically optimizing software for highly-efficient execution on a target hardware platform. Auto-tuning is particularly useful for GPU programming, as a single kernel requires re-tuning after code changes, for different input data, and for different architectures. However, the discrete, and non-convex nature of the search space creates a challenging optimization problem. In this work, we investigate which algorithm produces the fastest kernels if the time-budget for the tuning task is varied. We conduct a survey by performing experiments on 26 different kernel spaces, from 9 different GPUs, for 16 different evolutionary black-box optimization algorithms. We then analyze these results and introduce a novel metric based on the PageRank centrality concept as a tool for gaining insight into the difficulty of the optimization problem. We demonstrate that our metric correlates strongly with observed tuning performance.

, , , , , , , , , , ,
IEEE Transactions on Evolutionary Computation
Real-Time 3D Tomography , the Center for Optimal, Real-Time Machine Studies of the Explosive Universe

Schoonhoven, R., van Werkhoven, B., & Batenburg, J. (2022). Benchmarking optimization algorithms for auto-tuning GPU kernels. IEEE Transactions on Evolutionary Computation. doi:10.1109/TEVC.2022.3210654