Towards a fast detection of opponents in repeated stochastic games
Multi-agent algorithms aim to find the best response in strategic interactions. While many state-of-the-art algorithms assume repeated interaction with a fixed set of opponents (or even self-play), a learner in the real world is more likely to encounter the same strategic situation with changing counter-parties. This article presents a formal model of such sequential interactions, and a corresponding algorithm that combines the two established frameworks Pepper and Bayesian policy reuse. For each interaction, the algorithm faces a repeated stochastic game with an unknown (small) number of repetitions against a random opponent from a population, without observing the opponent’s identity. Our algorithm is composed of two main steps: first it draws inspiration from multiagent algorithms to obtain acting policies in stochastic games, and second it computes a belief over the possible opponents that is updated as the interaction occurs. This allows the agent to quickly select the appropriate policy against the opponent. Our results show fast detection of the opponent from its behavior, obtaining higher average rewards than the state-of-the-art baseline Pepper in repeated stochastic games.
|Stochastic games, Reinforcement learning, Multi-agent learning, Policy reuse|
|International Conference on Autonomous Agents and Multi-Agent Systems|
Hernandez-Leal, P, & Kaisers, M. (2017). Towards a fast detection of opponents in repeated stochastic games. In Lecture Notes in Computer Science/Lecture Notes in Artificial Intelligence (pp. 239–257). doi:10.1007/978-3-319-71682-4_15