Contextual value iteration and deep approximation for Bayesian contextual bandits

Duijndam, Kevin; Koole, Ger; van der Mei, Rob

K. Duijndam (Kevin), G.M. Koole (Ger) and R.D. van der Mei (Rob)

2025-11-28

Contextual value iteration and deep approximation for Bayesian contextual bandits

Presented at the 39th Conference on Neural Information Processing Systems (NeurIPS 2025) Workshop: MLxOR: Mathematical Foundations and Operational Integration of Machine Learning for Uncertainty-Aware Decision-Making

We present a Bayesian value-iteration framework for contextual multi-armed bandit problems that treats the agents posterior distribution for the pay-off as the state of the Markov Decision Process. We apply finite-dimensional priors on the unknown reward parameters, and the exogenous context transition kernel. Value iteration on the belief-MDP yields an optimal policy. We illustrate the approach in an airline seat-pricing simulation. To address the curse of dimensionality, we approximate the value function with a dual-stream deep learning network and benchmark our deep value iteration algorithm on a standard contextual bandit instance.

Additional Metadata
Keywords	Contextual multi-armed bandit, Bayesian belief MDP, Value iteration, Deep value function approximation, Online learning
Conference	39th Conference on Neural Information Processing Systems (NeurIPS 2025) Workshop: MLxOR: Mathematical Foundations and Operational Integration of Machine Learning for Uncertainty-Aware Decision-Making
Rights	creativecommons.org/licenses/by/4.0/
Organisation	Stochastics
Citation APA Style AAA Style APA Style Cell Style Chicago Style Harvard Style IEEE Style MLA Style Nature Style Vancouver Style American-Institute-of-Physics Style Council-of-Science-Editors Style BibTex Format Endnote Format RIS Format CSL Format DOIs only Format	Duijndam, K., Koole, G., & van der Mei, R. (2025). Contextual value iteration and deep approximation for Bayesian contextual bandits. In Proceedings NeurIPS (Annual Conference on Neural Information Processing Systems) (pp. 18:1–18:5).

Free Full Text ( Final Version , 420kb )

Contextual value iteration and deep approximation for Bayesian contextual bandits

Publication

Publication

Address

CWI researchers

Questions or comments?

Contextual value iteration and deep approximation for Bayesian contextual bandits

Publication

Publication

Workflow

Workflow

Add Content