This article considers multiagent algorithms that aim to find the best response in strategic interactions by learning about the game and their opponents from observations. In contrast to many state-of-the-art algorithms that assume repeated interaction with a fixed set of opponents (or even self-play), a learner in the real world is more likely to encounter the same strategic situation with changing counter-parties. First, we present a formal model of such sequential interactions, in which subsets from the player population are drawn sequentially to play a repeated stochastic game with an unknown (small) number of repetitions. In this setting the agents observe their joint actions but not the opponent identity. Second, we propose a learning algorithm to act in these sequential interactions. Our algorithm explicitly models the different opponents and their switching frequency to obtain an acting policy. It combines the multiagent algorithm PEPPER for repeated stochastic games with Bayesian inference to compute a belief over the hypothesized opponent behaviors, which is updated during interaction. This enables the agent to select the appropriate opponent model and to compute an adequate response. Our results show an efficient detection of the opponent based on its behavior, obtaining higher average rewards than a baseline (not modelling the opponents) in repeated stochastic games.

, ,
Multi-disciplinary Conference on Reinforcement Learning and Decision Making
Demand response for grid-friendly quasi-autarkic energy cooperatives
Intelligent and autonomous systems

Hernandez-Leal, P., & Kaisers, M. (2017). Learning against sequential opponents in repeated stochastic games. In Multi-disciplinary Conference on Reinforcement Learning and Decision Making (pp. 52–56).