2013
Addressing the Policy-bias of Q-learning by Repeating Updates
Publication
Publication
Presented at the
International Joint Conference on Autonomous Agents and Multiagent Systems, St. Paul, MN, USA
Q-learning is a very popular reinforcement learning algorithm being proven to converge to optimal policies in Markov decision processes. However, Q-learning shows artifacts in non-stationary environments, e.g., the probability of playing the optimal action may decrease if Q-values deviate significantly from the true values, a situation that may arise in the initial phase as well as after changes in the environment.These artifacts were resolved in literature by the variant Frequency Adjusted Q-learning (FAQL). However, FAQL also suffered from practical concerns that limited the policy subspace for which the behavior was improved. Here, we introduce the Repeated Update Q-learning (RUQL), a variant of Q-learning that resolves the undesirable artifacts of Q-learning without the practical concerns of FAQL.We show (both theoretically and experimentally) the similarities and differences between RUQL and FAQL (the closest state-of-the-art). Experimental results verify the theoretical insights and show how RUQL outperforms FAQL and QL in non-stationary environments.
Additional Metadata | |
---|---|
, , | |
, , , | |
T. Ito (Tsuyoshi) , C.M. Jonker (Catholijn) , M. Gini , O. Shehory (Onn) | |
International Joint Conference on Autonomous Agents and Multiagent Systems | |
Organisation | Intelligent and autonomous systems |
Abdallah, S., & Kaisers, M. (2013). Addressing the Policy-bias of Q-learning by Repeating Updates. In T. Ito, C. Jonker, M. Gini, & O. Shehory (Eds.), . |