An exploration strategy for non-stationary opponents

Hernandez-Leal, Pablo; Zhan, Yusen; Taylor, Matthew; Sucar, Enrique; Munoz de Cote, Enrique

doi:10.1007/s10458-016-9347-3

P. Hernandez-Leal (Pablo), Y. Zhan (Yusen), M.E. Taylor (Matthew), L.E. Sucar (Enrique) and E. Munoz de Cote (Enrique)

2017-09-01

An exploration strategy for non-stationary opponents

Autonomous Agents and Multi-Agent Systems , Volume 31 - Issue 5 p. 971- 1002

The success or failure of any learning algorithm is partially due to the exploration strategy it exerts. However, most exploration strategies assume that the environment is stationary and non-strategic. In this work we shed light on how to design exploration strategies in non-stationary and adversarial environments. Our proposed adversarial drift exploration (DE) is able to efficiently explore the state space while keeping track of regions of the environment that have changed. This proposed exploration is general enough to be applied in single agent non-stationary environments as well as in multiagent settings where the opponent changes its strategy in time. We use a two agent strategic interaction setting to test this new type of exploration, where the opponent switches between different behavioral patterns to emulate a non-deterministic, stochastic and adversarial environment. The agent’s objective is to learn a model of the opponent’s strategy to act optimally. Our contribution is twofold. First, we present DE as a strategy for switch detection. Second, we propose a new algorithm called R-max# for learning and planning against non-stationary opponent. To handle such opponents, R-max# reasons and acts in terms of two objectives: (1) to maximize utilities in the short term while learning and (2) eventually explore opponent behavioral changes. We provide theoretical results showing that R-max# is guaranteed to detect the opponent’s switch and learn a new model in terms of finite sample complexity. R-max# makes efficient use of exploration experiences, which results in rapid adaptation and efficient DE, to deal with the non-stationary nature of the opponent. We show experimentally how using DE outperforms the state of the art algorithms that were explicitly designed for modeling opponents (in terms average rewards) in two complimentary domains.

Additional Metadata
Keywords	Learning, Exploration, Non-stationary environments, Switching strategies, Repeated games
THEME	Networks (theme 7)
Publisher	Springer
Persistent URL	doi.org/10.1007/s10458-016-9347-3
Journal	Autonomous Agents and Multi-Agent Systems
Project	Demand response for grid-friendly quasi-autarkic energy cooperatives
Grant	This work was funded by the The Netherlands Organisation for Scientific Research (NWO); grant id nwo/651.001.003 - Demand response for grid-friendly quasi-autarkic energy cooperatives
Organisation	Intelligent and autonomous systems
Citation APA Style AAA Style APA Style Cell Style Chicago Style Harvard Style IEEE Style MLA Style Nature Style Vancouver Style American-Institute-of-Physics Style Council-of-Science-Editors Style BibTex Format Endnote Format RIS Format CSL Format DOIs only Format	Hernandez-Leal, P., Zhan, Y., Taylor, M., Sucar, E., & Munoz de Cote, E. (2017). An exploration strategy for non-stationary opponents. Autonomous Agents and Multi-Agent Systems, 31(5), 971–1002. doi:10.1007/s10458-016-9347-3

View at Publisher

Full Text ( Final Version , 1mb )

An exploration strategy for non-stationary opponents

Publication

Publication

Address

CWI researchers

Questions or comments?

An exploration strategy for non-stationary opponents

Publication

Publication

Workflow

Workflow

Add Content