Regret-minimization in risk-averse bandits

Agrawal, Shubhada; Juneja, Sandeep; Koolen-Wijkstra, Wouter

doi:10.1109/ICC54714.2021.9703134

S. Agrawal (Shubhada), S. Juneja (Sandeep) and W.M. Koolen-Wijkstra (Wouter)

2021-12-20

Regret-minimization in risk-averse bandits

Presented at the 7th Indian Control Conference, ICC 2021 (December 2021), Bombay, India

Classical regret minimization in a bandit frame-work involves a number of probability distributions or arms that are not known to the learner but that can be sampled from or pulled. The learner's aim is to sequentially pull these arms so as to maximize the number of times the best arm is pulled, or equivalently, minimize the regret associated with the sub-optimal pulls. Best is classically defined as the arm with the largest mean. Lower bounds on expected regret are well known, and lately, in great generality, efficient algorithms that match the lower bounds have been developed. In this paper we extend this methodology to a more general risk-reward set-up where the best arm corresponds to the one with the lowest average loss (negative of reward), with a multiple of Conditional-Value-At-Risk (\mathbf{CVaR}) of the loss distribution added to it. (\mathbf{CVaR}) is a popular tail risk measure. The settings where risk becomes an important consideration, typically involve heavy-Tailed distributions. Unlike in most of the previous literature, we allow for all the distributions with a known uniform bound on the moment of order (1+\epsilon), allowing for heavy-Tailed bandits. We extend the lower bound of the classical regret minimization setup to this setting and develop an index-based algorithm. Like the popular KL-UCB algorithm for the mean setting, our index is derived from the proposed lower bound, and is based on the empirical likelihood principle. We also propose anytime-valid confidence intervals for the mean-CVaR trade-off metric. En route, we develop concentration inequalities, which may be of independent interest.

Additional Metadata
Persistent URL	doi.org/10.1109/ICC54714.2021.9703134
Conference	7th Indian Control Conference, ICC 2021
Organisation	Centrum Wiskunde & Informatica, Amsterdam (CWI), The Netherlands
Citation APA Style AAA Style APA Style Cell Style Chicago Style Harvard Style IEEE Style MLA Style Nature Style Vancouver Style American-Institute-of-Physics Style Council-of-Science-Editors Style BibTex Format Endnote Format RIS Format CSL Format DOIs only Format	Agrawal, S., Juneja, S., & Koolen-Wijkstra, W. (2021). Regret-minimization in risk-averse bandits. In Proceedings of the Indian Control Conference, ICC (pp. 195–200). doi:10.1109/ICC54714.2021.9703134

View at Publisher

Full Text ( Author Manuscript , 248kb )

Regret-minimization in risk-averse bandits

Publication

Publication

Address

CWI researchers

Questions or comments?

Regret-minimization in risk-averse bandits

Publication

Publication

Workflow

Workflow

Add Content