Learning Optimal Policies in Markov Decision Processes with Value Function Discovery

Onderwater, Martijn; Bhulai, Sandjai; van der Mei, Rob

M. Onderwater (Martijn), S. Bhulai (Sandjai) and R.D. van der Mei (Rob)

2015

Learning Optimal Policies in Markov Decision Processes with Value Function Discovery

Presented at the International Symposium on Computer Performance, Modeling, Measurements and Evaluation, Sydney, Australia

In this paper we describe recent progress in our work on Value Function Discovery (VFD), a novel method for discovery of value functions for Markov Decision Processes (MDPs). In a previous paper we described how VFD discovers algebraic descriptions of value functions (and the corresponding policies) using ideas from the Evolutionary Algorithm field. A special feature of VFD is that the descriptions include the model parameters of the MDP. We extend that work and show how additional information about the structure of the MDP can be included in VFD. This alternative use of VFD still yields near-optimal policies, and is much faster. Besides increased performance and improved run times, this approach illustrates that VFD is not restricted to learning value functions and can be applied more generally.

Additional Metadata
Keywords	Markov processes, Dynamic programming
THEME	Logistics (theme 3)
Project	Realisation of Reliable and Secure Residential Sensor Platforms
Conference	International Symposium on Computer Performance, Modeling, Measurements and Evaluation
Organisation	Stochastics
Citation APA Style AAA Style APA Style Cell Style Chicago Style Harvard Style IEEE Style MLA Style Nature Style Vancouver Style American-Institute-of-Physics Style Council-of-Science-Editors Style BibTex Format Endnote Format RIS Format CSL Format DOIs only Format	Onderwater, M., Bhulai, S., & van der Mei, R. (2015). Learning Optimal Policies in Markov Decision Processes with Value Function Discovery. In Proceedings International Symposium on Computer Performance, Modeling, Measurements and Evaluation (IFIP WG 7.3 Performa).

See Also
article Learning Optimal Policies in Markov Decision Processes with Value Function Discovery M. Onderwater (Martijn), S. Bhulai (Sandjai) and R.D. van der Mei (Rob)
article Learning Optimal Policies in Markov Decision Processes with Value Function Discovery M. Onderwater (Martijn), S. Bhulai (Sandjai) and R.D. van der Mei (Rob)

Learning Optimal Policies in Markov Decision Processes with Value Function Discovery

Publication

Publication

article
Learning Optimal Policies in Markov Decision Processes with Value Function Discovery

article
Learning Optimal Policies in Markov Decision Processes with Value Function Discovery

Address

CWI researchers

Questions or comments?

Learning Optimal Policies in Markov Decision Processes with Value Function Discovery

Publication

Publication

article Learning Optimal Policies in Markov Decision Processes with Value Function Discovery

article Learning Optimal Policies in Markov Decision Processes with Value Function Discovery

Workflow

Workflow

Add Content

article
Learning Optimal Policies in Markov Decision Processes with Value Function Discovery

article
Learning Optimal Policies in Markov Decision Processes with Value Function Discovery