A hybrid stochastic policy gradient algorithm for reinforcement learning

Pham, Nhan; M. Nguyen, Lam; Phan, Dzung; Nguyen, Phuong Ha; van Dijk, Marten; Tran-Dinh, Quoc

N.H. Pham (Nhan), L. M. Nguyen (Lam), D.T. Phan (Dzung), P.H. Nguyen (Phuong Ha), M.E. van Dijk (Marten) and Q. Tran-Dinh (Quoc)

2020-03-01

A hybrid stochastic policy gradient algorithm for reinforcement learning

Presented at the International Conference on Artificial Intelligence and Statistics (August 2020), Online

We propose a novel hybrid stochastic pol-icy gradient estimator by combining an un-biased policy gradient estimator, the REIN-FORCE estimator, with another biased one,an adapted SARAH estimator for policy op-timization. The hybrid policy gradient esti-mator is shown to be biased, but has vari-ance reduced property. Using this estimator,we develop a new Proximal Hybrid StochasticPolicy Gradient Algorithm (ProxHSPGA) tosolve a composite policy optimization prob-lem that allows us to handle constraints orregularizers on the policy parameters. Wefirst propose a single-looped algorithm thenintroduce a more practical restarting vari-ant.We prove that both algorithms canachieve the best-known trajectory complex-ityO(ε−3)to attain a first-order stationarypoint for the composite problem which is bet-ter than existing REINFORCE/GPOMDPO(ε−4)and SVRPGO(ε−10/3)in the non-composite setting. We evaluate the perfor-mance of our algorithm on several well-knownexamples in reinforcement learning. Numer-ical results show that our algorithm outper-forms two existing methods on these exam-ples. Moreover, the composite settings in-deed have some advantages compared to thenon-composite ones on certain problems.

Additional Metadata
Stakeholder	IBM Research, Thomas J. Watson Research Center, USA
Conference	International Conference on Artificial Intelligence and Statistics
Organisation	Computer Security
Citation APA Style AAA Style APA Style Cell Style Chicago Style Harvard Style IEEE Style MLA Style Nature Style Vancouver Style American-Institute-of-Physics Style Council-of-Science-Editors Style BibTex Format Endnote Format RIS Format CSL Format DOIs only Format	Pham, N., Nguyen, L., Phan, D., Nguyen, P. H., van Dijk, M., & Tran-Dinh, Q. (2020). A hybrid stochastic policy gradient algorithm for reinforcement learning. In Proceedings of the International Conference on Artificial Intelligence and Statistics (pp. 374–385).

Full Text ( Author Manuscript , 5mb )

A hybrid stochastic policy gradient algorithm for reinforcement learning

Publication

Publication

Address

CWI researchers

Questions or comments?

A hybrid stochastic policy gradient algorithm for reinforcement learning

Publication

Publication

Workflow

Workflow

Add Content