Abstract:BACKGROUND AND OBJECTIVE: Risk stratification aims to provide physicians with the accurate assessment of a patient's clinical risk such that an individualized prevention or management strategy can be developed and delivered. Existing risk stratification techniques mainly focus on predicting the overall risk of an individual patient in a supervised manner, and, at the cohort level, often offer little insight beyond a flat score-based segmentation from the labeled clinical dataset. To this end, in this paper, we propose a new approach for risk stratification by exploring a large volume of electronic health records (EHRs) in an unsupervised fashion.METHODS: Along this line, this paper proposes a novel probabilistic topic modeling framework called probabilistic risk stratification model (PRSM) based on Latent Dirichlet Allocation (LDA). The proposed PRSM recognizes a patient clinical state as a probabilistic combination of latent sub-profiles, and generates sub-profile-specific risk tiers of patients from their EHRs in a fully unsupervised fashion. The achieved stratification results can be easily recognized as high-, medium- and low-risk, respectively. In addition, we present an extension of PRSM, called weakly supervised PRSM (WS-PRSM) by incorporating minimum prior information into the model, in order to improve the risk stratification accuracy, and to make our models highly portable to risk stratification tasks of various diseases.RESULTS: We verify the effectiveness of the proposed approach on a clinical dataset containing 3463 coronary heart disease (CHD) patient instances. Both PRSM and WS-PRSM were compared with two established supervised risk stratification algorithms, i.e., logistic regression and support vector machine, and showed the effectiveness of our models in risk stratification of CHD in terms of the Area Under the receiver operating characteristic Curve (AUC) analysis. As well, in comparison with PRSM, WS-PRSM has over 2% performance gain, on the experimental dataset, demonstrating that incorporating risk scoring knowledge as prior information can improve the performance in risk stratification.CONCLUSIONS: Experimental results reveal that our models achieve competitive performance in risk stratification in comparison with existing supervised approaches. In addition, the unsupervised nature of our models makes them highly portable to the risk stratification tasks of various diseases. Moreover, patient sub-profiles and sub-profile-specific risk tiers generated by our models are coherent and informative, and provide significant potential to be explored for the further tasks, such as patient cohort analysis. We hypothesize that the proposed framework can readily meet the demand for risk stratification from a large volume of EHRs in an open-ended fashion.

Addressing the implementation challenge of risk prediction model due to missing risk factors: The submodel approximation approach

Improving Cardiovascular Risk Prediction Through Machine Learning Modelling of Irregularly Repeated Electronic Health Records

Assessing the Impact of Imputation on the Interpretations of Prediction Models: A Case Study on Mortality Prediction for Patients with Acute Myocardial Infarction.

Implementation of Prediction Models in the Emergency Department from an Implementation Science Perspective-Determinants, Outcomes, and Real-World Impact: A Scoping Review

Risk Prediction with Imperfect Survival Outcome Information from Electronic Health Records

Clinical Prediction Models: Basic Concepts, Application Scenarios, and Research Strategies

Prediction Models in Prehospital and Emergency Medicine Research: How to Derive and Internally Validate a Clinical Prediction Model

Towards artificial intelligence-based disease prediction algorithms that comprehensively leverage and continuously learn from real-world clinical tabular data systems

Developing Knowledge-enhanced Chronic Disease Risk Prediction Models from Regional EHR Repositories

Implications of the Use of Artificial Intelligence Predictive Models in Health Care Settings : A Simulation Study

A probabilistic topic model for clinical risk stratification from electronic health records

Performance Metrics for the Comparative Analysis of Clinical Risk Prediction Models Employing Machine Learning

Incorporating informatively collected laboratory data from EHR in clinical prediction models

Dealing with Missing, Imbalanced, and Sparse Features During the Development of a Prediction Model for Sudden Death Using Emergency Medicine Data: Machine Learning Approach

Risk Projection for Time-to-Event Outcome Leveraging Summary Statistics With Source Individual-Level Data

A Comorbidity Knowledge-Aware Model for Disease Prognostic Prediction

A comparison of risk prediction methods using repeated observations: an application to electronic health records for hemodialysis

Evaluation of clinical prediction models (part 1): from development to external validation

Dealing with the Missing, Imbalanced and Sparse Features Problems in Emergency Data Using Random Forest, K-means and PCA Respectively (Preprint)

A scalable approach for developing clinical risk prediction applications in different hospitals

Clinical Predictive Modeling of Heart Failure: Domain Description, Models' Characteristics and Literature Review