Robust multi-agent Q-learning in cooperative games with adversaries
We present RoM-Q 1, a new Q-learning-like algorithm for finding policies robust to attacks in multi-agent systems (MAS). We consider a novel type of attack, where a team of adversaries, aware of the optimal multi-agent Q-value function, performs a worst-case selection of both the agents to attack and the actions to perform. Our motivation lies in real-world MAS where vulnerabilities of particular agents emerge due to their characteristics and robust policies need to be learned without requiring the simulation of attacks during training. In our simulations, where we train policies using RoMQ, Q-learning and minimax-Q and derive corresponding adversarial attacks, we observe that policies learned using RoM-Q are more robust, as they accrue the highest rewards against all considered adversarial attacks.
|AAAI-21 Workshop on Reinforcement Learning in Games|
|Organisation||Intelligent and autonomous systems|
Nisioti, E, Bloembergen, D, & Kaisers, M. (2021). Robust multi-agent Q-learning in cooperative games with adversaries. In AAAI-21 Workshop on Reinforcement Learning in Games.