A Metabolism-Based Interpretable Machine Learning Prediction Model for Diabetic Retinopathy Risk: A Cross-Sectional Study in Chinese Patients with Type 2 Diabetes

Guo-Wei Zong,Wan-Ying Wang,Jun Zheng,Wei Zhang,Wei-Ming Luo,Zhong-Ze Fang,Qiang Zhang
DOI: https://doi.org/10.1155/2023/3990035
IF: 4.0608
2023-05-17
Journal of Diabetes Research
Abstract:The burden of diabetic retinopathy (DR) is increasing, and the sensitive biomarkers of the disease were not enough. Studies have found that the metabolic profile, such as amino acid (AA) and acylcarnitine (AcylCN), in the early stages of DR patients might have changed, indicating the potential of metabolites to become new biomarkers. We are amid to construct a metabolite-based prediction model for DR risk. This study was conducted on type 2 diabetes (T2D) patients with or without DR. Logistic regression and extreme gradient boosting (XGBoost) prediction models were constructed using the traditional clinical features and the screening features, respectively. Assessing the predictive power of the models in terms of both discrimination and calibration, the optimal model was interpreted using the Shapley Additive exPlanations (SHAP) to quantify the effect of features on prediction. Finally, the XGBoost model incorporating AA and AcylCN variables had the best comprehensive evaluation ( ROCAUC = 0.82 , PRAUC = 0.44 , Brier score = 0.09 ). C18 : 1OH lower than 0.04 μmol/L, C18 : 1 lower than 0.70 μmol/L, threonine higher than 27.0 μmol/L, and tyrosine lower than 36.0 μmol/L were associated with an increased risk of developing DR. Phenylalanine higher than 52.0 μmol/L was associated with a decreased risk of developing DR. In conclusion, our study mainly used AAs and AcylCNs to construct an interpretable XGBoost model to predict the risk of developing DR in T2D patients which is beneficial in identifying high-risk groups and preventing or delaying the onset of DR. In addition, our study proposed possible risk cut-off values for DR of C18 : 1OH, C18 : 1, threonine, tyrosine, and phenylalanine.
endocrinology & metabolism,medicine, research & experimental
What problem does this paper attempt to address?
This paper aims to solve the problem of risk prediction for diabetic retinopathy (DR). Specifically, the research objective is to construct an interpretable machine - learning prediction model based on metabolites for predicting the risk of DR in patients with type 2 diabetes (T2D). Through this model, researchers hope to identify high - risk groups and prevent or delay the occurrence of DR. ### Research Background Diabetic retinopathy (DR) is a common microvascular complication of diabetes and one of the leading causes of blindness among working - age people. In China, the prevalence of DR is relatively high, and the early symptoms are not obvious, while in the late stage it is irreversible and the treatment effect is limited. Therefore, finding new biomarkers and developing effective prediction models are of great significance for early identification of high - risk groups. ### Research Methods 1. **Data Collection**: The study included 1,032 patients with type 2 diabetes, among which 162 had DR and 870 did not have DR. The clinical characteristics, amino acid (AA) and acyl - carnitine (AcylCN) metabolite profile data of the patients were collected. 2. **Feature Selection**: LASSO regression was used for feature screening, and finally 15 features were selected, including 7 amino acids (alanine, citrulline, glutamic acid, ornithine, phenylalanine, threonine, tyrosine) and 3 acyl - carnitines (C18:1, C18:1OH, C18:2), as well as the patient's age, systolic blood pressure, total cholesterol and the course of type 2 diabetes. 3. **Model Construction and Validation**: Logistic regression (LR) and extreme gradient boosting (XGBoost) algorithms were used respectively to construct prediction models. The performance of the models was evaluated by the area under the receiver operating characteristic curve (ROC) (AUC), the area under the precision - recall curve (PR) (PRAUC) and the Brier score. 4. **Model Explanation**: The Shapley Additive exPlanations (SHAP) method was used to explain the model and quantify the influence of each feature on the prediction result. ### Main Findings - **Best Model**: The XGBoost model combined with AA and AcylCN variables had the highest comprehensive evaluation in the test set (ROCAUC = 0.82, PRAUC = 0.44, Brier score = 0.09). - **Important Features**: C18:1OH lower than 0.04 μmol/L, C18:1 lower than 0.70 μmol/L, threonine higher than 27.0 μmol/L, tyrosine lower than 36.0 μmol/L are associated with an increased risk of DR. Phenylalanine higher than 52.0 μmol/L is associated with a reduced risk of DR. ### Conclusion The study successfully constructed an interpretable machine - learning model based on metabolites for predicting the risk of DR in patients with type 2 diabetes. This model not only helps to identify high - risk groups, but also proposes possible DR risk thresholds, providing a scientific basis for early prevention and intervention.