Addressing environment non-stationarity by repeating Q-learning updates

Abdallah, Sherief; Kaisers, Michael

S. Abdallah (Sherief) and M. Kaisers (Michael)

2016-04-01

Addressing environment non-stationarity by repeating Q-learning updates

Journal of Machine Learning Research , Volume 17 p. 46

Q-learning (QL) is a popular reinforcement learning algorithm that is guaranteed to converge to optimal policies in Markov decision processes. However, QL exhibits an artifact: in expectation, the effective rate of updating the value of an action depends on the probability of choosing that action. In other words, there is a tight coupling between the learning dynamics and underlying execution policy. This coupling can cause performance degradation in noisy non-stationary environments. Here, we introduce Repeated Update Q-learning (RUQL), a learning algorithm that resolves the undesirable artifact of Q-learning while maintaining simplicity. We analyze the similarities and differences between RUQL, QL, and the closest state-of-the-art algorithms theoretically. Our analysis shows that RUQL maintains the convergence guarantee of QL in stationary environments, while relaxing the coupling between the execution policy and the learning dynamics. Experimental results confirm the theoretical insights and show how RUQL outperforms both QL and the closest state-of-the-art algorithms in noisy non-stationary environments. ©2016 Sherief Abdallah and Michael Kaisers.

Additional Metadata
Journal	Journal of Machine Learning Research
Organisation	Information Systems
Citation APA APA Style APA-ALL Style AAA Style Cell Style Chicago Style Harvard Style IEEE Style MLA Style Nature Style Vancouver Style American-Institute-of-Physics Style Council-of-Science-Editors Style BibTex Format Endnote Format RIS Format CSL Format DOIs only Format	Abdallah, S.& Kaisers, M. (2016). Addressing environment non-stationarity by repeating Q-learning updates. Journal of Machine Learning Research, 17.

Free Full Text ( Final Version , 574kb )

Addressing environment non-stationarity by repeating Q-learning updates

Publication

Publication

Address

CWI researchers

Questions or comments?

Addressing environment non-stationarity by repeating Q-learning updates

Publication

Publication

Workflow

Workflow

Add Content