Problems in operations research are typically combinatorial and high-dimensional. To a degree, linear programs may efficiently solve such large decision problems. For stochastic multi-period problems, decomposition into a sequence of one-stage decisions with approximated downstream effects is often necessary, e.g., by deploying reinforcement learning to obtain value function approximations (VFAs). When embedding such VFAs into one-stage linear programs, VFA design is restricted by linearity. This paper presents an integrated simulation approach for such complex optimization problems, developing a deep reinforcement learning algorithm that combines linear programming and neural network VFAs. Our proposed method embeds neural network VFAs into one-stage linear decision problems, combining the nonlinear expressive power of neural networks with the efficiency of solving linear programs. As a proof of concept, we perform numerical experiments on a transportation problem. The neural network VFAs consistently outperform polynomial VFAs as well as other benchmarks, with limited design and tuning effort.

, , , , , ,
doi.org/10.1109/WSC48552.2020.9384078
Winter Simulation Conference
Centrum Wiskunde & Informatica, Amsterdam, The Netherlands

van Heeswijk, W.J.A, & La Poutré, J.A. (2021). Deep reinforcement learning in linear discrete action spaces. In Proceedings of the Winter Simulation Conference (pp. 1063–1074). doi:10.1109/WSC48552.2020.9384078