Abstract:In a typical stochastic multi-armed bandit problem, the objective is often to maximize the expected sum of rewards over some time horizon $T$. While the choice of a strategy that accomplishes that is optimal with no additional information, it is no longer the case when provided additional environment-specific knowledge. In particular, in areas of high volatility like healthcare or finance, a naive reward maximization approach often does not accurately capture the complexity of the learning problem and results in unreliable solutions. To tackle problems of this nature, we propose a framework of adaptive risk-aware strategies that operate in non-stationary environments. Our framework incorporates various risk measures prevalent in the literature to map multiple families of multi-armed bandit algorithms into a risk-sensitive setting. In addition, we equip the resulting algorithms with the Restarted Bayesian Online Change-Point Detection (R-BOCPD) algorithm and impose a (tunable) forced exploration strategy to detect local (per-arm) switches. We provide finite-time theoretical guarantees and an asymptotic regret bound of order $\tilde O(\sqrt{K_T T})$ up to time horizon $T$ with $K_T$ the total number of change-points. In practice, our framework compares favorably to the state-of-the-art in both synthetic and real-world environments and manages to perform efficiently with respect to both risk-sensitivity and non-stationarity.

Bandit Algorithm Driven by a Classical Random Walk and a Quantum Walk

Quantum exploration algorithms for multi-armed bandits

Multi-Armed Bandits and Quantum Channel Oracles

Decentralized Stochastic Multi-Player Multi-Armed Walking Bandits

Quantum Reinforcement Learning Method and Application Based on Value Function

Quantum Reinforcement Learning for Multi-Armed Bandits

Towards Fundamental Limits of Multi-armed Bandits with Random Walk Feedback

Quantum Multi-Armed Bandits and Stochastic Linear Bandits Enjoy Logarithmic Regrets

Forced Exploration in Bandit Problems

Random Walk Bandits.

Experimental implementation of quantum-walk-based portfolio optimisation

Experimental implementation of the quantum random-walk algorithm

Planning and Learning in Risk-Aware Restless Multi-Arm Bandit Problem

Old Dog Learns New Tricks: Randomized UCB for Bandit Problems

A Multi-armed Bandit MCMC, with applications in sampling from doubly intractable posterior

Preparing random state for quantum financing with quantum walks

Efficient Multivariate Bandit Algorithm with Path Planning

On Abruptly-Changing and Slowly-Varying Multiarmed Bandit Problems

The Traveling Bandit: A Framework for Bayesian Optimization with Movement Costs

A Risk-Averse Framework for Non-Stationary Stochastic Multi-Armed Bandits

Bandits with Concave Aggregated Reward