This paper presents new deviation inequalities that are valid uniformly in time under adaptive sampling in a multi-armed bandit model. The deviations are measured using the Kullback-Leibler divergence in a given one-dimensional exponential family, and take into account multiple arms at a time. They are obtained by constructing for each arm a mixture martingale based on a hierarchical prior, and by multiplying those martingales. Our deviation inequalities allow us to analyze stopping rules based on generalized likelihood ratios for a large class of sequential identification problems. We establish asymptotic optimality of sequential tests generalising the track-and-stop method to problems beyond best arm identification. We further derive sharper stopping thresholds, where the number of arms is replaced by the newly introduced pure exploration problem rank. We construct tight confidence intervals for linear functions and minima/maxima of the vector of arm means.

, , , ,
Journal of Machine Learning Research
Centrum Wiskunde & Informatica, Amsterdam (CWI), The Netherlands

Kaufmann, E, & Koolen-Wijkstra, W.M. (2021). Mixture martingales revisited with applications to sequential tests and confidence intervals. Journal of Machine Learning Research, 22, 1–44.