2018-12-02
Sequential test for the lowest mean: From Thompson to Murphy sampling
Publication
Publication
Learning the minimum/maximum mean among a finite set of distributions is a fundamental sub-task in planning, game tree search and reinforcement learning. We formalize this learning task as the problem of sequentially testing how the minimum mean among a finite set of distributions compares to a given threshold. We develop refined non-asymptotic lower bounds, which show that optimality mandates very different sampling behavior for a low vs high true minimum. We show that Thompson Sampling and the intuitive Lower Confidence Bounds policy each nail only one of these cases. We develop a novel approach that we call Murphy Sampling. Even though it entertains exclusively low true minima, we prove that MS is optimal for both possibilities. We then design advanced self-normalized deviation inequalities, fueling more aggressive stopping rules. We complement our theoretical guarantees by experiments showing that MS works best in practice.
Additional Metadata | |
---|---|
Annual Conference on Advances in Neural Information Processing Systems | |
Organisation | Centrum Wiskunde & Informatica, Amsterdam (CWI), The Netherlands |
Kaufmann, E., Koolen-Wijkstra, W., & Garivier, A. (2018). Sequential test for the lowest mean: From Thompson to Murphy sampling. In Advances in Neural Information Processing Systems (pp. 6332–6342). |