Machine learning algorithms to predict mild cognitive impairment in older adults in China: A cross-sectional study

Yanliqing Song,Quan Yuan,Haoqiang Liu,KeNan Gu,Yue Liu
DOI: https://doi.org/10.1016/j.jad.2024.09.059
2024-09-11
Abstract:Objective: This study aimed to explore the predictive value of machine learning (ML) in mild cognitive impairment (MCI) among older adults in China and to identify important factors causing MCI. Methods: In this study, 6434 older adults were selected based on the data of the China Health and Elderly Care Longitudinal Survey (CHARLS) in 2020, and the dataset was subsequently divided into the training set and the test set, with a ratio of 6:4. To construct a prediction model for MCI in older adults, six ML algorithms were used, including logistic regression, KNN, SVM, decision tree (DT), LightGBM, and random forest (RF). The Delong test was used to compare the differences of ROC curves of different models, while decision curve analysis (DCA) was used to evaluate the model performance. The important contributions of the prediction results were then used to explain the model by the SHAP value.The Matthews correlation coefficient (MCC) was calculated to evaluate the performance of the models on imbalanced datasets. Additionally, causal analysis and counterfactual analysis were conducted to understand the feature importance and variable effects. Results: The area under the ROC curve of each model range from 0.71 to 0.77, indicating significant difference (P < 0.01). The DCA results show that the net benefits of LightGBM is the largest within various probability thresholds. Among all the models, the LightGBM model demonstrated the highest performance and stability. The five most important characteristics for predicting MCI were educational level, social events, gender, relationship with children, and age. Causal analysis revealed that these variables had a significant impact on MCI, with an average treatment effect of -0.144. Counterfactual analysis further validated these findings by simulating different scenarios, such as improving educational level, increasing age, and increasing social events. Conclusion: The ML algorithm can effectively predict the MCI of older adults in China and identify the important factors causing MCI.
What problem does this paper attempt to address?