2020-03-01
A hybrid stochastic policy gradient algorithm for reinforcement learning
Publication
Publication
We propose a novel hybrid stochastic pol-icy gradient estimator by combining an un-biased policy gradient estimator, the REIN-FORCE estimator, with another biased one,an adapted SARAH estimator for policy op-timization. The hybrid policy gradient esti-mator is shown to be biased, but has vari-ance reduced property. Using this estimator,we develop a new Proximal Hybrid StochasticPolicy Gradient Algorithm (ProxHSPGA) tosolve a composite policy optimization prob-lem that allows us to handle constraints orregularizers on the policy parameters. Wefirst propose a single-looped algorithm thenintroduce a more practical restarting vari-ant.We prove that both algorithms canachieve the best-known trajectory complex-ityO(ε−3)to attain a first-order stationarypoint for the composite problem which is bet-ter than existing REINFORCE/GPOMDPO(ε−4)and SVRPGO(ε−10/3)in the non-composite setting. We evaluate the perfor-mance of our algorithm on several well-knownexamples in reinforcement learning. Numer-ical results show that our algorithm outper-forms two existing methods on these exam-ples. Moreover, the composite settings in-deed have some advantages compared to thenon-composite ones on certain problems.
Additional Metadata | |
---|---|
IBM Research, Thomas J. Watson Research Center, USA | |
International Conference on Artificial Intelligence and Statistics | |
Organisation | Computer Security |
Pham, N., Nguyen, L., Phan, D., Nguyen, P. H., van Dijk, M., & Tran-Dinh, Q. (2020). A hybrid stochastic policy gradient algorithm for reinforcement learning. In Proceedings of the International Conference on Artificial Intelligence and Statistics (pp. 374–385). |