Agents acting in real-world scenarios often have constraints such as finite budgets or daily job performance targets. While repeated (episodic) tasks can be solved with existing RL algorithms, methods need to be extended if the repetition depends on performance. Recent work has introduced a distributional perspective on reinforcement learning, providing a model of episodic returns. Inspired by these results we contribute the new budget- and risk-aware distributional reinforcement learning (BRAD-RL) algorithm that bootstraps from the C51 distributional output and then uses value iteration to estimate the value of starting an episode with a certain amount of budget. With this strategy we can make budget-wise action selection within each episode and maximize the return across episodes. Experiments in a grid-world domain highlight the benefits of our algorithm, maximizing discounted future returns when low cumulative performance may terminate repetition.

Demand response for grid-friendly quasi-autarkic energy cooperatives
Adaptive and Learning Agents
Intelligent and autonomous systems

Serrano, J, Morales, E.F, Hernandez-Leal, P, Bloembergen, D, & Kaisers, M. (2018). Learning on a Budget Using Distributional RL. In Proceedings of Adaptive and Learning Agents (ALA) Workshop, 2018.