This paper addresses the challenge of online gen- eralization in tree search. We propose Multiple Estimator Monte Carlo Tree Search (ME-MCTS), with a two-fold contribution: first, we introduce a formalization of online generalization that can rep- resent existing techniques such as “history heuris- tics”, “RAVE”, or “OMA” – contextual action value estimators or abstractors that generalize across spe- cific contexts. Second, we incorporate recent ad- vances in estimator averaging that enable guiding search by combining the online action value esti- mates of any number of such abstractors or sim- ilar types of action value estimators. Unlike pre- vious work, which usually proposed a single ab- stractor for either the selection or the rollout phase of MCTS simulations, our approach focuses on the combination of multiple estimators and applies them to all move choices in MCTS simulations. As the MCTS tree itself is just another value estima- tor – unbiased, but without abstraction – this blurs the traditional distinction between action choices inside and outside of the MCTS tree. Experi- ments with three abstractors in four board games show significant improvements of ME-MCTS over MCTS using only a single abstractor, both for MCTS with random rollouts as well as for MCTS with static evaluation functions. While we used deterministic, fully observable games, ME-MCTS naturally extends to more challenging settings.

, ,
doi.org/10.24963/ijcai.2021/555
IJCAI 2021
Intelligent and autonomous systems

Baier, H., & Kaisers, M. (2021). ME-MCTS: Online generalization by combining multiple value estimators. In Proceedings of the International Conference on Artificial Intelligence (pp. 4032–4038). doi:10.24963/ijcai.2021/555