2025-11-28
Contextual value iteration and deep approximation for Bayesian contextual bandits
Publication
Publication
Presented at the
39th Conference on Neural Information Processing Systems (NeurIPS 2025) Workshop: MLxOR: Mathematical Foundations and Operational Integration of Machine Learning for Uncertainty-Aware Decision-Making
We present a Bayesian value-iteration framework for contextual multi-armed bandit problems that treats the agents posterior distribution for the pay-off as the state of the Markov Decision Process. We apply finite-dimensional priors on the unknown reward parameters, and the exogenous context transition kernel. Value iteration on the belief-MDP yields an optimal policy. We illustrate the approach in an airline seat-pricing simulation. To address the curse of dimensionality, we approximate the value function with a dual-stream deep learning network and benchmark our deep value iteration algorithm on a standard contextual bandit instance.
| Additional Metadata | |
|---|---|
| , , , , | |
| 39th Conference on Neural Information Processing Systems (NeurIPS 2025) Workshop: MLxOR: Mathematical Foundations and Operational Integration of Machine Learning for Uncertainty-Aware Decision-Making | |
| Organisation | Stochastics |
|
Duijndam, K., Koole, G., & van der Mei, R. (2025). Contextual value iteration and deep approximation for Bayesian contextual bandits. In Proceedings NeurIPS (Annual Conference on Neural Information Processing Systems) (pp. 18:1–18:5). |
|