Problems in operations research are typically combinatorial and high-dimensional. To a degree, linear programs may efficiently solve such large decision problems. For stochastic multi-period problems, decomposition into a sequence of one-stage decisions with approximated downstream effects is often necessary, e.g., by deploying reinforcement learning to obtain value function approximations (VFAs). When embedding such VFAs into one-stage linear programs, VFA design is restricted by linearity. This paper presents an integrated simulation approach for such complex optimization problems, developing a deep reinforcement learning algorithm that combines linear programming and neural network VFAs. Our proposed method embeds neural network VFAs into one-stage linear decision problems, combining the nonlinear expressive power of neural networks with the efficiency of solving linear programs. As a proof of concept, we perform numerical experiments on a transportation problem. The neural network VFAs consistently outperform polynomial VFAs as well as other benchmarks, with limited design and tuning effort.

, , , , , ,
doi.org/10.1109/WSC48552.2020.9384078
Winter Simulation Conference 2020
Intelligent and autonomous systems

van Heeswijk, W., & La Poutré, H. (2021). Deep reinforcement learning in linear discrete action spaces. In Proceedings of the Winter Simulation Conference (pp. 1063–1074). doi:10.1109/WSC48552.2020.9384078