2015
Value Function Discovery in Markov Decision Processes with Evolutionary Algorithms
Publication
Publication
In this paper we introduce a novel method for
discovery of value functions for Markov Decision Processes
(MDPs). This method, which we call Value Function Discovery
(VFD), is based on ideas from the Evolutionary Algorithm field.
VFD’s key feature is that it discovers descriptions of value
functions that are algebraic in nature. This feature is unique,
because the descriptions include the model parameters of the
MDP. The algebraic expression of the value function discovered by
VFD can be used in several scenarios, e.g., conversion to a policy
(with one-step policy improvement) or control of systems with
time-varying parameters. The work in this paper is a first step
towards exploring potential usage scenarios of discovered value
functions. We give a detailed description of VFD and illustrate its
application on an example MDP. For this MDP we let VFD discover
an algebraic description of a value function that closely resembles
the optimal value function. The discovered value function is
then used to obtain a policy, which we compare numerically
to the optimal policy of the MDP. The resulting policy shows
near-optimal performance on a wide range of model parameters.
Finally, we identify and discuss future application scenarios of
discovered value functions.
Additional Metadata | |
---|---|
, , , | |
IEEE | |
doi.org/10.1109/TSMC.2015.2475716 | |
IEEE Transactions on Systems, Man, and Cybernetics: Systems | |
Realisation of Reliable and Secure Residential Sensor Platforms | |
Organisation | Stochastics |
Onderwater, M., Bhulai, S., & van der Mei, R. (2015). Value Function Discovery in Markov Decision Processes with Evolutionary Algorithms. IEEE Transactions on Systems, Man, and Cybernetics: Systems. doi:10.1109/TSMC.2015.2475716 |