An Explainable Deep Reinforcement Learning Model for Warfarin Maintenance Dosing Using Policy Distillation and Action Forging

Sadjad Anzabi Zadeh,W. Nick Street,Barrett W. Thomas
2024-04-26
Abstract:Deep Reinforcement Learning is an effective tool for drug dosing for chronic condition management. However, the final protocol is generally a black box without any justification for its prescribed doses. This paper addresses this issue by proposing an explainable dosing protocol for warfarin using a Proximal Policy Optimization method combined with Policy Distillation. We introduce Action Forging as an effective tool to achieve explainability. Our focus is on the maintenance dosing protocol. Results show that the final model is as easy to understand and deploy as the current dosing protocols and outperforms the baseline dosing algorithms.
Machine Learning
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is to improve the interpretability and performance of the warfarin maintenance - dose protocol. Specifically, the authors propose an interpretable model based on Deep Reinforcement Learning (DRL), aiming to provide a method for warfarin maintenance - dose that is both easy to understand and superior to existing dosing protocols. ### Problem Background Warfarin is a commonly used anticoagulant drug, but its dose adjustment is very complicated because patients' diet, lifestyle and genetic factors can all affect the drug's efficacy. In addition, the effective treatment range of warfarin is very narrow. Excessive use may lead to bleeding, while insufficient dose may cause thromboembolism. Therefore, finding the appropriate dose is crucial for patients' safety and efficacy. Most of the existing warfarin dosing protocols rely on clinical trial data and supervised learning methods, such as non - linear regression models. Although these methods perform well in predicting the initial dose, they still have limitations when adjusting the maintenance dose. In particular, although the DRL model is superior to traditional methods in performance, it is usually a "black box" and it is difficult to explain its decision - making process, which is unacceptable in the medical field. ### Core Contributions of the Paper To overcome the above problems, this paper proposes an interpretable deep reinforcement learning model that combines Proximal Policy Optimization (PPO) and Policy Distillation. This model improves interpretability through the following techniques: 1. **Action Forging**: By pre - processing the action space, the final policy is made easier to interpret. For example, reducing the frequency and number of dose - change options makes it more convenient for doctors to understand and use this model. 2. **Policy Distillation**: Convert the trained DRL model into a decision tree, thereby generating a form similar to the existing dosing tables for clinical application. ### Experimental Results The experimental results show that the proposed model is not only superior to the existing dosing protocols in performance, but also its output form is more intuitive and easier to understand. This is very important for clinicians because it can improve their trust and acceptance of the model while ensuring the treatment effect. In short, the goal of this paper is to develop an intelligent system that can not only effectively manage warfarin doses, but also clearly explain its decision - making process, in order to improve medical safety and the level of personalized treatment.