Addressing the Policy-bias of Q-learning by Repeating Updates

Abdallah, Sherief; Kaisers, Michael

Q-learning is a very popular reinforcement learning algorithm being proven to converge to optimal policies in Markov decision processes. However, Q-learning shows artifacts in non-stationary environments, e.g., the probability of playing the optimal action may decrease if Q-values deviate significantly from the true values, a situation that may arise in the initial phase as well as after changes in the environment.These artifacts were resolved in literature by the variant Frequency Adjusted Q-learning (FAQL). However, FAQL also suffered from practical concerns that limited the policy subspace for which the behavior was improved. Here, we introduce the Repeated Update Q-learning (RUQL), a variant of Q-learning that resolves the undesirable artifacts of Q-learning without the practical concerns of FAQL.We show (both theoretically and experimentally) the similarities and differences between RUQL and FAQL (the closest state-of-the-art). Experimental results verify the theoretical insights and show how RUQL outperforms FAQL and QL in non-stationary environments.

Additional Metadata
Keywords	Q-learning, Non-stationary Environment, Dynamics
MSC	Population dynamics (general) (msc 92D25), Stochastic learning and adaptive control (msc 93E35), Artificial intelligence (msc 68Txx), Learning and adaptive systems (msc 68T05)
THEME	Null option (theme 11)
Editor	T. Ito (Tsuyoshi) , C.M. Jonker (Catholijn) , M. Gini , O. Shehory (Onn)
Conference	International Joint Conference on Autonomous Agents and Multiagent Systems
Organisation	Intelligent and autonomous systems
Citation APA Style AAA Style APA Style Cell Style Chicago Style Harvard Style IEEE Style MLA Style Nature Style Vancouver Style American-Institute-of-Physics Style Council-of-Science-Editors Style BibTex Format Endnote Format RIS Format CSL Format DOIs only Format	Abdallah, S., & Kaisers, M. (2013). Addressing the Policy-bias of Q-learning by Repeating Updates. In T. Ito, C. Jonker, M. Gini, & O. Shehory (Eds.), .

Addressing the Policy-bias of Q-learning by Repeating Updates

Publication

Publication

Address

CWI researchers

Questions or comments?

Addressing the Policy-bias of Q-learning by Repeating Updates

Publication

Publication

Workflow

Workflow

Add Content