Robust online planning with imperfect models
Environment models are not always known a priori, and approximating stochastic transition dynamics may introduce errors, especially if only a small amount of data is available and/or model misspecification is a concern. This work introduces a robust decision-time planning method in order to cope with such imprecise models. The objective of robust planning is to find a policy with the best guaranteed performance, which we approach by transferring a two-stage minimization-maximization optimization procedure taken from the field of robust control to online planning. We assume a Markov Decision Process underlying the environment and aim for the best worst-case performance within specific model error bounds. To compute solutions, we introduce a family of locally robust decision-time planning algorithms, specifically robust Monte Carlo Tree Search (rMCTS). Robust MCTS methods are then evaluated empirically with model error bounded by Wasserstein distance, for which we find the resulting robust policies to yield safer and more uncertainty-aware behavior than their non-robust counterparts. Adaptability in model error bounds and corresponding model minimisers makes robust MCTS extensible for a variety of online planning settings.