An exploration strategy facing non-stationary agents (JAAMAS paper)
The success or failure of any learning algorithm is partially due to the exploration strategy it exerts. However, most exploration strategies assume that the environment is star tionary and non-strategic. This work investigates how to design exploration strategies in non-stationary and adversarial environments. Our experimental setting uses a two agents strategic interaction scenario, where the opponent switches between different behavioral patterns. The agent's objective is to learn a model of the opponent's strategy to act optimally, despite non-determinism and stochasticity. Our contribution is twofold. First, we present drift exploration as a strategy for switch detection. Second, we propose a new algorithm called R-MAX# that reasons and acts in terms of two objectives: 1) to maximize utilities in the short term while learning and 2) eventually explore implicitly looking for opponent behavioral changes. We provide theoretical results showing that R-MAX# is guaranteed to detect the opponent's switch and learn a new model in terms of finite sample complexity.
|Exploration, Non-stationary environments, Repeated games|
|International Joint Conference on Autonomous Agents and Multiagent Systems|
|Organisation||Centrum Wiskunde & Informatica, Amsterdam, The Netherlands|
Hernandez-Leal, P, Zhan, Y, Taylor, M.E, Munoz de Cote, E, & Sucar, L.E. (2017). An exploration strategy facing non-stationary agents (JAAMAS paper). In Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS (pp. 922–923).