Bandits with many optimal arms

de Heide, Rianne; Cheshire, James; Ménard, Pierre; Carpentier, Alexandra

R. de Heide (Rianne), J. Cheshire (James), P. Ménard (Pierre) and A. Carpentier (Alexandra)

2021-12-06

Bandits with many optimal arms

Presented at the Thirty-fifth Conference on Neural Information Processing Systems (December 2021), Online

We consider a stochastic bandit problem with a possibly infinite number of arms. We write p∗ for the proportion of optimal arms and Δ for the minimal mean-gap between optimal and sub-optimal arms. We characterize the optimal learning rates both in the cumulative regret setting, and in the best-arm identification setting in terms of the problem parameters T (the budget), p∗ and Δ. For the objective of minimizing the cumulative regret, we provide a lower bound of order Ω(log(T)/(p∗Δ)) and a UCB-style algorithm with matching upper bound up to a factor of log(1/Δ). Our algorithm needs p∗ to calibrate its parameters, and we prove that this knowledge is necessary, since adapting to p∗ in this setting is impossible. For best-arm identification we also provide a lower bound of order Ω(exp(−cTΔ2p∗)) on the probability of outputting a sub-optimal arm where c>0 is an absolute constant. We also provide an elimination algorithm with an upper bound matching the lower bound up to a factor of order log(T) in the exponential, and that does not need p∗ or Δ as parameter. Our results apply directly to the three related problems of competing against the j-th best arm, identifying an ϵ good arm, and finding an arm with mean larger than a quantile of a known order.

Additional Metadata
Keywords	Machine learning
Conference	Thirty-fifth Conference on Neural Information Processing Systems
Organisation	Machine Learning
Citation APA Style AAA Style APA Style Cell Style Chicago Style Harvard Style IEEE Style MLA Style Nature Style Vancouver Style American-Institute-of-Physics Style Council-of-Science-Editors Style BibTex Format Endnote Format RIS Format CSL Format DOIs only Format	de Heide, R., Cheshire, J., Ménard, P., & Carpentier, A. (2021). Bandits with many optimal arms. In Proceedings NeurIPS (Annual Conference on Neural Information Processing Systems) (pp. 22457–22469).

Free Full Text ( Final Version , 5mb )

Bandits with many optimal arms

Publication

Publication

Address

CWI researchers

Questions or comments?

Bandits with many optimal arms

Publication

Publication

Workflow

Workflow

Add Content