Safe reinforcement learning using risk mapping by similarity
Reinforcement learning (RL) has been used to successfully solve sequential decision problem. However, considering risk at the same time as the learning process is an open research problem. In this work, we are interested in the type of risk that can lead to a catastrophic state. Related works that aim to deal with risk propose complex models. In contrast, we follow a simple, yet effective, idea: similar states might lead to similar risk. Using this idea, we propose risk mapping by similarity (RMS), an algorithm for discrete scenarios which infers the risk of newly discovered states by analyzing how similar they are to previously known risky states. In general terms, the RMS algorithm transfers the knowledge gathered by the agent regarding the risk to newly discovered states. We contribute with a new approach to consider risk based on similarity and with RMS, which is simple and generalizable as long as the premise similar states yield similar risk holds. RMS is not an RL algorithm, but a method to generate a risk-aware reward shaping signal that can be used with a RL algorithm to generate risk-aware policies.
|Reinforcement learning, Reward shaping, Risk, Risk map, Safe reinforcement learning|
|Organisation||Centrum Wiskunde & Informatica, Amsterdam, The Netherlands|
Serrano, J, Morales, E.F, & Hernandez-Leal, P. (2019). Safe reinforcement learning using risk mapping by similarity. Adaptive Behavior. doi:10.1177/1059712319859650