Intelligent Diagnosis of Kawasaki Disease From Real-World Data Using Interpretable Machine Learning Models
Yifan Duan,Ruiqi Wang,Zhilin Huang,Haoran Chen,Mingkun Tang,Jiayin Zhou,Zhengyong Hu,Wanfei Hu,Zhenli Chen,Qing Qian,Haolin Wang
DOI: https://doi.org/10.1016/j.hjc.2024.08.003
2024-08-09
Abstract:Objective: This study aimed to leverage real-world electronic medical record (EMR) data to develop interpretable machine learning models for diagnosis of Kawasaki disease, while also exploring and prioritizing the significant risk factors. Methods: A comprehensive study was conducted on 4,087 pediatric patients at the Children's Hospital of Chongqing, China. The study collected demographic data, physical examination results, and laboratory findings. Statistical analyses were performed using SPSS 26.0. The optimal feature subset was employed to develop intelligent diagnostic prediction models based on the Light Gradient Boosting Machine (LGBM), Explainable Boosting Machine (EBM), Gradient Boosting Classifier (GBC), Fast Interpretable Greedy-Tree Sums (FIGS), Decision Tree (DT), AdaBoost Classifier (AdaBoost), and Logistic Regression (LR). Model performance was evaluated in three dimensions: discriminative ability via Receiver Operating Characteristic curves, calibration accuracy using calibration curves, and interpretability through Shapley Additive Explanations (SHAP) and Local Interpretable Model-Agnostic Explanations (LIME). Results: In this study, Kawasaki disease was diagnosed in 2,971 participants. Analysis was conducted on 31 indicators, including red blood cell distribution width and erythrocyte sedimentation rate. The EBM model demonstrated superior performance compared to other models, with an Area Under the Curve (AUC) of 0.97, second only to the GBC model. Furthermore, the EBM model exhibited the highest calibration accuracy and maintained its interpretability without relying on external analytical tools like SHAP and LIME, thus reducing interpretation biases. Platelet distribution width, total protein, and erythrocyte sedimentation rate were identified by the model as significant predictors for the diagnosis of Kawasaki disease. Conclusions: This study employed diverse machine learning models for early diagnosis of Kawasaki disease. The findings demonstrated that interpretable models, like EBM, outperformed traditional machine learning models in terms of both interpretability and performance. Ensuring consistency between predictive models and clinical evidence is crucial for the successful integration of artificial intelligence into real-world clinical practice.