Conditionally Risk-Averse Contextual Bandits

Mónika Farsang,Paul Mineiro,Wangda Zhang
DOI: https://doi.org/10.48550/arXiv.2210.13573
IF: 5.414
2022-10-24
Machine Learning
Abstract:We desire to apply contextual bandits to scenarios where average-case statistical guarantees are inadequate. Happily, we discover the composition of reduction to online regression and expectile loss is analytically tractable, computationally convenient, and empirically effective. The result is the first risk-averse contextual bandit algorithm with an online regret guarantee. We state our precise regret guarantee and conduct experiments from diverse scenarios in dynamic pricing, inventory management, and self-tuning software; including results from a production exascale cloud data processing system.
What problem does this paper attempt to address?