Sequential test for the lowest mean: From Thompson to Murphy sampling
Learning the minimum/maximum mean among a finite set of distributions is a fundamental sub-task in planning, game tree search and reinforcement learning. We formalize this learning task as the problem of sequentially testing how the minimum mean among a finite set of distributions compares to a given threshold. We develop refined non-asymptotic lower bounds, which show that optimality mandates very different sampling behavior for a low vs high true minimum. We show that Thompson Sampling and the intuitive Lower Confidence Bounds policy each nail only one of these cases. We develop a novel approach that we call Murphy Sampling. Even though it entertains exclusively low true minima, we prove that MS is optimal for both possibilities. We then design advanced self-normalized deviation inequalities, fueling more aggressive stopping rules. We complement our theoretical guarantees by experiments showing that MS works best in practice.
|Annual Conference on Advances in Neural Information Processing Systems|
|Organisation||Centrum Wiskunde & Informatica, Amsterdam (CWI), The Netherlands|
Kaufmann, E, Koolen-Wijkstra, W.M, & Garivier, A. (2018). Sequential test for the lowest mean: From Thompson to Murphy sampling. In Advances in Neural Information Processing Systems (pp. 6332–6342).