Mathematics of statistical sequential decision-making: concentration, risk-awareness and modelling in stochastic bandits, with applications to bariatric surgery

Patrick Saux

2024-05-03

Abstract:This thesis aims to study some of the mathematical challenges that arise in the analysis of statistical sequential decision-making algorithms for postoperative patients follow-up. Stochastic bandits (multiarmed, contextual) model the learning of a sequence of actions (policy) by an agent in an uncertain environment in order to maximise observed rewards. To learn optimal policies, bandit algorithms have to balance the exploitation of current knowledge and the exploration of uncertain actions. Such algorithms have largely been studied and deployed in industrial applications with large datasets, low-risk decisions and clear modelling assumptions, such as clickthrough rate maximisation in online advertising. By contrast, digital health recommendations call for a whole new paradigm of small samples, risk-averse agents and complex, nonparametric modelling. To this end, we developed new safe, anytime-valid concentration bounds, (Bregman, empirical Chernoff), introduced a new framework for risk-aware contextual bandits (with elicitable risk measures) and analysed a novel class of nonparametric bandit algorithms under weak assumptions (Dirichlet sampling). In addition to the theoretical guarantees, these results are supported by in-depth empirical evidence. Finally, as a first step towards personalised postoperative follow-up recommendations, we developed with medical doctors and surgeons an interpretable machine learning model to predict the long-term weight trajectories of patients after bariatric surgery.

Machine Learning,Statistics Theory

What problem does this paper attempt to address?

The paper mainly investigates the mathematical challenges in statistical sequence decision-making, particularly the algorithm analysis for postoperative patient tracking. It explores the random multi-armed bandits model for learning a series of actions (strategies) to maximize the observed rewards in an uncertain environment. The paper proposes new safe and anytime-efficient concentration bounds, risk-aware contextual bandits framework, and non-parametric bandits algorithms, and applies these theories to digital health recommendations, specifically predicting the long-term weight trajectories of patients after weight loss surgery. Additionally, the paper involves collaboration with medical experts and the development of interpretable machine learning models.

Mathematics of statistical sequential decision-making: concentration, risk-awareness and modelling in stochastic bandits, with applications to bariatric surgery

Non-stationary Bandits with Habituation and Recovery Dynamics and Knapsack Constraints

Learning Modular Safe Policies in the Bandit Setting with Application to Adaptive Clinical Trials

Understanding the stochastic dynamics of sequential decision-making processes: A path-integral analysis of multi-armed bandits

A Risk-Averse Framework for Non-Stationary Stochastic Multi-Armed Bandits

Probabilistic Prediction for Binary Treatment Choice: with focus on personalized medicine

Collapsing Bandits and Their Application to Public Health Interventions

Algorithms for multi-armed bandit problems

Geometry-Aware Approaches for Balancing Performance and Theoretical Guarantees in Linear Bandits

A Contextual-bandit-based Approach for Informed Decision-making in Clinical Trials

A Survey of Risk-Aware Multi-Armed Bandits

Scalable Decision-Focused Learning in Restless Multi-Armed Bandits with Application to Maternal and Child Health

A New Bandit Setting Balancing Information from State Evolution and Corrupted Context

Multi-armed Bandit Models for the Optimal Design of Clinical Trials: Benefits and Challenges

Clinical Online Recommendation with Subgroup Rank Feedback

A Case Study of Stochastic Optimization in Health Policy: Problem Formulation and Preliminary Results

Thompson sampling for zero-inflated count outcomes with an application to the Drink Less mobile health study

Correlational Dueling Bandits with Application to Clinical Treatment in Large Decision Spaces

Identifiable latent bandits: Combining observational data and exploration for personalized healthcare

Bandit Algorithms for Precision Medicine

A Bandit Model for Human-Machine Decision Making with Private Information and Opacity