A Random Forest Algorithm for Assessing Risk Factors Associated With Chronic Kidney Disease: Observational Study

Pei Liu,Yijun Liu,Hao Liu,Linping Xiong,Changlin Mei,Lei Yuan
DOI: https://doi.org/10.2196/48378
2024-06-03
Abstract:Background: The prevalence and mortality rate of chronic kidney disease (CKD) are increasing year by year, and it has become a global public health issue. The economic burden caused by CKD is increasing at a rate of 1% per year. CKD is highly prevalent and its treatment cost is high but unfortunately remains unknown. Therefore, early detection and intervention are vital means to mitigate the treatment burden on patients and decrease disease progression. Objective: In this study, we investigated the advantages of using the random forest (RF) algorithm for assessing risk factors associated with CKD. Methods: We included 40,686 people with complete screening records who underwent screening between January 1, 2015, and December 22, 2020, in Jing'an District, Shanghai, China. We grouped the participants into those with and those without CKD by staging based on the glomerular filtration rate staging and grouping based on albuminuria. Using a logistic regression model, we determined the relationship between CKD and risk factors. The RF machine learning algorithm was used to score the predictive variables and rank them based on their importance to construct a prediction model. Results: The logistic regression model revealed that gender, older age, obesity, abnormal index estimated glomerular filtration rate, retirement status, and participation in urban employee medical insurance were significantly associated with the risk of CKD. On RF algorithm-based screening, the top 4 factors influencing CKD were age, albuminuria, working status, and urinary albumin-creatinine ratio. The RF model predicted an area under the receiver operating characteristic curve of 93.15%. Conclusions: Our findings reveal that the RF algorithm has significant predictive value for assessing risk factors associated with CKD and allows the screening of individuals with risk factors. This has crucial implications for early intervention and prevention of CKD.
What problem does this paper attempt to address?