2020-07-13
Structure adaptive algorithms for stochastic bandits
Publication
Publication
We study reward maximisation in a wide class of structured stochastic multi-armed bandit problems, where the mean rewards of arms satisfy some given structural constraints, e.g. linear, unimodal, sparse, etc. Our aim is to develop methods that are \emph{flexible} (in that they easily adapt to different structures), \emph{powerful} (in that they perform well empirically and/or provably match instance-dependent lower bounds) and \emph{efficient} in that the per-round computational burden is small. We develop asymptotically optimal algorithms from instance-dependent lower-bounds using iterative saddle-point solvers. Our approach generalises recent iterative methods for pure exploration to reward maximisation, where a major challenge arises from the estimation of the sub-optimality gaps and their reciprocals. Still we manage to achieve all the above desiderata. Notably, our technique avoids the computational cost of the full-blown saddle point oracle employed by previous work, while at the same time enabling finite-time regret bounds. Our experiments reveal that our method successfully leverages the structural assumptions, while its regret is at worst comparable to that of vanilla UCB.
Additional Metadata | |
---|---|
Proceedings of Machine Learning Research | |
37th International Conference on Machine Learning, ICML 2020 | |
Organisation | Machine Learning |
Degenne, R., Shao, H., & Koolen-Wijkstra, W. (2020). Structure adaptive algorithms for stochastic bandits. In Proceedings of the International Conference on Machine Learning (pp. 2443–2452). |