Robust temporal difference learning for critical domains

Klima, Richard; Bloembergen, Daniel; Kaisers, Michael; Tuyls, Karl

R. Klima (Richard), D. Bloembergen (Daniel), M. Kaisers (Michael) and K. Tuyls (Karl)

2019-05-13

Robust temporal difference learning for critical domains

Presented at the International Conference on Autonomous Agents and Multiagent Systems (May 2019), Montréal, Canada

We present a new Q-function operator for temporal difference (TD) learning methods that explicitly encodes robustness against significant rare events (SRE) in critical domains. The operator, which we call the κ-operator, allows to learn a robust policy in a model-based fashion without actually observing the SRE. We introduce single and multi-agent robust TD methods using the operator κ. We prove convergence of the operator to the optimal robust Q-function with respect to the model using the theory of Generalized Markov Decision Processes. In addition we prove convergence to the optimal Q-function of the original MDP given that the probability of SREs vanishes. Empirical evaluations demonstrate the superior performance of κ-based TD methods both in the early learning phase as well as in the final converged stage. In addition we show robustness of the proposed method to small model errors, as well as its applicability in a multi-agent context.

Additional Metadata
Keywords	Reinforcement learning, Robust learning, Multi-agent learning
Conference	International Conference on Autonomous Agents and Multiagent Systems
Organisation	Intelligent and autonomous systems
Citation APA APA Style APA-ALL Style AAA Style Cell Style Chicago Style Harvard Style IEEE Style MLA Style Nature Style Vancouver Style American-Institute-of-Physics Style Council-of-Science-Editors Style BibTex Format Endnote Format RIS Format CSL Format DOIs only Format	Klima, R., Bloembergen, D., Kaisers, M.& Tuyls, K. (2019). Robust temporal difference learning for critical domains. Proceedings of the International Conference on Autonomous Agents and Multiagent Systems, 350–358.

Free Full Text ( Final Version , 1mb )

Robust temporal difference learning for critical domains

Publication

Publication

Address

CWI researchers

Questions or comments?

Robust temporal difference learning for critical domains

Publication

Publication

Workflow

Workflow

Add Content