2015
Learning Optimal Policies in Markov Decision Processes with Value Function Discovery
Publication
Publication
Presented at the
International Symposium on Computer Performance, Modeling, Measurements and Evaluation, Sydney, Australia
In this paper we describe recent progress in our work on
Value Function Discovery (VFD), a novel method for discovery
of value functions for Markov Decision Processes (MDPs).
In a previous paper we described how VFD discovers algebraic
descriptions of value functions (and the corresponding
policies) using ideas from the Evolutionary Algorithm field.
A special feature of VFD is that the descriptions include the
model parameters of the MDP. We extend that work and
show how additional information about the structure of the
MDP can be included in VFD. This alternative use of VFD
still yields near-optimal policies, and is much faster. Besides
increased performance and improved run times, this
approach illustrates that VFD is not restricted to learning
value functions and can be applied more generally.
Additional Metadata | |
---|---|
, | |
Realisation of Reliable and Secure Residential Sensor Platforms | |
International Symposium on Computer Performance, Modeling, Measurements and Evaluation | |
Organisation | Stochastics |
Onderwater, M., Bhulai, S., & van der Mei, R. (2015). Learning Optimal Policies in Markov Decision Processes with Value Function Discovery. In Proceedings International Symposium on Computer Performance, Modeling, Measurements and Evaluation (IFIP WG 7.3 Performa). |