Abstract:Observational longitudinal studies are a common means to study treatment efficacy and safety in chronic mental illness. In many such studies, treatment changes may be initiated by either the patient or by their clinician and can thus vary widely across patients in their timing, number, and type. Indeed, in the observational longitudinal pathway of the STEP-BD study of bipolar depression, one of the motivations for this work, no two patients have the same treatment history even after coarsening clinic visits to a weekly time-scale. Estimation of an optimal treatment regime using such data is challenging as one cannot naively pool together patients with the same treatment history, as is required by methods based on inverse probability weighting, nor is it possible to apply backwards induction over the decision points, as is done in Q-learning and its variants. Thus, additional structure is needed to effectively pool information across patients and within a patient over time. Current scientific theory for many chronic mental illnesses maintains that a patient's disease status can be conceptualized as transitioning among a small number of discrete states. We use this theory to inform the construction of a partially observable Markov decision process model of patient health trajectories wherein observed health outcomes are dictated by a patient's latent health state. Using this model, we derive and evaluate estimators of an optimal treatment regime under two common paradigms for quantifying long-term patient health. The finite sample performance of the proposed estimator is demonstrated through a series of simulation experiments and application to the observational pathway of the STEP-BD study. We find that the proposed method provides high-quality estimates of an optimal treatment strategy in settings where existing approaches cannot be applied without ad hoc modifications.

A Flexible Framework for Incorporating Patient Preferences Into Q-Learning

Data Quality Aware Hierarchical Federated Reinforcement Learning Framework for Dynamic Treatment Regimes

Reinforcement Learning in Clinical Medicine: a Method to Optimize Dynamic Treatment Regime over Time.

Penalized Q-Learning for Dynamic Treatment Regimes

Robust Hybrid Learning for Estimating Personalized Dynamic Treatment Regimens

Set-valued dynamic treatment regimes for competing outcomes

An optimal learning method for developing personalized treatment regimes

Adaptive Weight Learning for Multiple Outcome Optimization With Continuous Treatment

Identifying optimally cost-effective dynamic treatment regimes with a Q-learning approach

Pruning the Way to Reliable Policies: A Multi-Objective Deep Q-Learning Approach to Critical Care

Estimating Optimal Infinite Horizon Dynamic Treatment Regimes via pT-Learning

Latent-state models for precision medicine

Representation Learning for Integrating Multi-domain Outcomes to Optimize Individualized Treatments

Learning Individualized Treatment Rules with Estimated Translated Inverse Propensity Score

Learning Optimal Dynamic Treatment Regimens Subject to Stagewise Risk Controls

Intervening in the Lives of Youth with Complex Behavioral Health Challenges and Their Families: The Role of the Wraparound Process

Making SMART decisions in prophylaxis and treatment studies

Beyond Human Preferences: Exploring Reinforcement Learning Trajectory Evaluation and Improvement through LLMs

Stage-Aware Learning for Dynamic Treatments

Pareto-Optimal Estimation and Policy Learning on Short-term and Long-term Treatment Effects

Optimizing personalized treatments for targeted patient populations across multiple domains