Abstract:Motivated by the emerging needs of personalized preventative intervention in many healthcare applications, we consider a multi-stage, dynamic decision-making problem in the online setting with unknown model parameters. To deal with the pervasive issue of small sample size in personalized planning, we develop a novel data-pooling reinforcement learning (RL) algorithm based on a general perturbed value iteration framework. Our algorithm adaptively pools historical data, with three main innovations: (i) the weight of pooling ties directly to the performance of decision (measured by regret) as opposed to estimation accuracy in conventional methods; (ii) no parametric assumptions are needed between historical and current data; and (iii) requiring data-sharing only via aggregate statistics, as opposed to patient-level data. Our data-pooling algorithm framework applies to a variety of popular RL algorithms, and we establish a theoretical performance guarantee showing that our pooling version achieves a regret bound strictly smaller than that of the no-pooling counterpart. We substantiate the theoretical development with empirically better performance of our algorithm via a case study in the context of post-discharge intervention to prevent unplanned readmissions, generating practical insights for healthcare management. In particular, our algorithm alleviates privacy concerns about sharing health data, which (i) opens the door for individual organizations to levering public datasets or published studies to better manage their own patients; and (ii) provides the basis for public policy makers to encourage organizations to share aggregate data to improve population health outcomes for the broader community.

Identifying Decision Points for Safe and Interpretable Reinforcement Learning in Hypotension Treatment

Identifying Distinct, Effective Treatments for Acute Hypotension with SODA-RL: Safely Optimized Diverse Accurate Reinforcement Learning

Is Deep Reinforcement Learning Ready for Practical Applications in Healthcare? A Sensitivity Analysis of Duel-DDQN for Hemodynamic Management in Sepsis Patients

Optimal Treatment Strategies for Critical Patients with Deep Reinforcement Learning

Reinforcement Learning in Clinical Medicine: a Method to Optimize Dynamic Treatment Regime over Time.

Pruning the Way to Reliable Policies: A Multi-Objective Deep Q-Learning Approach to Critical Care

Deep Offline Reinforcement Learning for Real-world Treatment Optimization Applications

Trajectory Inspection: A Method for Iterative Clinician-Driven Design of Reinforcement Learning Studies

Policy Learning for Individualized Treatment Regimes on Infinite Time Horizon

Predicting the Need for Blood Transfusion in Intensive Care Units with Reinforcement Learning

Reinforcement Learning For Survival, A Clinically Motivated Method For Critically Ill Patients

Offline Inverse Constrained Reinforcement Learning for Safe-Critical Decision Making in Healthcare

Reinforcement Learning For Sepsis Treatment: A Continuous Action Space Solution

Safe and Interpretable Estimation of Optimal Treatment Regimes

Reinforcement Learning in Dynamic Treatment Regimes Needs Critical Reexamination

Quasi-optimal Reinforcement Learning with Continuous Actions

DTR-Bench: An in silico Environment and Benchmark Platform for Reinforcement Learning Based Dynamic Treatment Regime

Reinforcement Learning in Healthcare: A Survey

Challenges for Reinforcement Learning in Healthcare

Dynamic Measurement Scheduling for Event Forecasting using Deep RL

Data-pooling Reinforcement Learning for Personalized Healthcare Intervention