Structure adaptive algorithms for stochastic bandits

Degenne, Rémy; Shao, Han; Koolen-Wijkstra, Wouter

R.R.B.P. Degenne (Rémy), H. Shao (Han) and W.M. Koolen-Wijkstra (Wouter)

2020-07-13

Structure adaptive algorithms for stochastic bandits

Presented at the 37th International Conference on Machine Learning, ICML 2020 (July 2020), Online

We study reward maximisation in a wide class of structured stochastic multi-armed bandit problems, where the mean rewards of arms satisfy some given structural constraints, e.g. linear, unimodal, sparse, etc. Our aim is to develop methods that are \emph{flexible} (in that they easily adapt to different structures), \emph{powerful} (in that they perform well empirically and/or provably match instance-dependent lower bounds) and \emph{efficient} in that the per-round computational burden is small. We develop asymptotically optimal algorithms from instance-dependent lower-bounds using iterative saddle-point solvers. Our approach generalises recent iterative methods for pure exploration to reward maximisation, where a major challenge arises from the estimation of the sub-optimality gaps and their reciprocals. Still we manage to achieve all the above desiderata. Notably, our technique avoids the computational cost of the full-blown saddle point oracle employed by previous work, while at the same time enabling finite-time regret bounds. Our experiments reveal that our method successfully leverages the structural assumptions, while its regret is at worst comparable to that of vanilla UCB.

Additional Metadata
Series	Proceedings of Machine Learning Research
Conference	37th International Conference on Machine Learning, ICML 2020
Organisation	Machine Learning
Citation APA Style AAA Style APA Style Cell Style Chicago Style Harvard Style IEEE Style MLA Style Nature Style Vancouver Style American-Institute-of-Physics Style Council-of-Science-Editors Style BibTex Format Endnote Format RIS Format CSL Format DOIs only Format	Degenne, R., Shao, H., & Koolen-Wijkstra, W. (2020). Structure adaptive algorithms for stochastic bandits. In Proceedings of the International Conference on Machine Learning (pp. 2443–2452).

Free Full Text ( Final Version , 6mb )

See Also
software\|data Regret games paper R.R.B.P. Degenne (Rémy), H. Shao (Han) and W.M. Koolen-Wijkstra (Wouter)

Structure adaptive algorithms for stochastic bandits

Publication

Publication

software|data
Regret games paper

Address

CWI researchers

Questions or comments?

Structure adaptive algorithms for stochastic bandits

Publication

Publication

software|data Regret games paper

Workflow

Workflow

Add Content

software|data
Regret games paper