Learning predictive checklists from continuous medical data

Yukti Makhija,Edward De Brouwer,Rahul G. Krishnan
DOI: https://doi.org/10.48550/arXiv.2211.07076
2022-11-14
Abstract:Checklists, while being only recently introduced in the medical domain, have become highly popular in daily clinical practice due to their combined effectiveness and great interpretability. Checklists are usually designed by expert clinicians that manually collect and analyze available evidence. However, the increasing quantity of available medical data is calling for a partially automated checklist design. Recent works have taken a step in that direction by learning predictive checklists from categorical data. In this work, we propose to extend this approach to accomodate learning checklists from continuous medical data using mixed-integer programming approach. We show that this extension outperforms a range of explainable machine learning baselines on the prediction of sepsis from intensive care clinical trajectories.
Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: How to automatically generate predictive checklists from continuous medical data to assist clinical decision - making, especially for sepsis prediction in intensive care. ### Problem Background In the medical field, checklists are popular because of their high efficiency and interpretability. Traditional checklists are usually manually designed by expert doctors, who formulate rules by collecting and analyzing existing evidence. However, with the sharp increase in the amount of medical data, manually designing checklists has become increasingly difficult and time - consuming. Therefore, researchers have begun to explore partially automated checklist design methods. ### Existing Work Previous studies have attempted to learn predictive checklists from categorical data (such as Boolean or categorical data). However, much clinical data (such as image or time - series data) is continuous in nature, which limits the application scope of these methods. ### Main Contributions of the Paper This paper proposes a new method to learn predictive checklists from continuous - valued medical data using Mixed - Integer Programming (MIP). Specifically: 1. **Introducing Threshold Learning for Continuous Features**: - Convert continuous features into binary concepts by defining concepts based on the learned thresholds. - Use mixed - integer programming to optimize these thresholds and weights, thereby constructing an optimal checklist. 2. **Performance Improvement**: - On the sepsis prediction task, this method outperforms other interpretable machine - learning baseline models, especially with a significant improvement in recall. 3. **Interpretability and Practicality**: - Although it may not be as accurate as complex black - box models (such as Multi - Layer Perceptron MLP), the checklist generated by this method has higher interpretability and practicality and is more suitable for clinical applications. ### Mathematical Formulas - Checklist Prediction Formula: \[ \hat{y}_i=\left(w^{T} C\left(X_i\right) \geq M\right) \] where \(C\left(X_i\right)\) is the binary concept vector derived from the input variable \(X_i\), \(w \in \{0, 1\}^d\) is the learnable binary weight, and \(M \in \mathbb{R}\) is the threshold parameter. - Objective Function: \[ w^*, C^*, M^*=\arg \min _{w, C, M} L(y, \hat{y}) \] - Mixed - Integer Programming Objective: \[ \min _{w, z, M, t} l_++\lambda l_-+\epsilon_N N+\epsilon_M M \] where \(l_+\) and \(l_-\) are the numbers of positive - class and negative - class misclassifications respectively, and \(N\) and \(M\) are constraint conditions used to control the model complexity. Through this method, the paper shows how to learn effective and interpretable predictive checklists from continuous medical data, providing strong support for clinical decision - making.