Development and Validation of Interpretable Machine Learning Models to Predict Glomerular Filtration Rate in Chronic Kidney Disease Colombian Patients

Luis Rojas,Angela Janeth Pereira,William David Amador,Albert Montenegro,Walberto Buelvas,Victor de la Espriella
DOI: https://doi.org/10.1177/00045632241285528
2024-09-06
Abstract:Background: ML predictive models have shown their capability to improve risk prediction and assist medical decision-making, nevertheless, there is a lack of accuracy systems to early identify future rapid CKD progressors in Colombia and even in South America. Objective: The purpose of this study was to develop a series of interpretable machine learning models that predict GFR at 6-months, 9-months, and 12-months. Study design and setting: Over 29,000 CKD patients stages 1 to 3b (estimated GFR, <60 ml/min / 1.73 m2) with an average of 3-year follow-up data were included. We used the machine learning extreme gradient boosting (XGBoost) to build three models to predict the next eGFR. Models were internally and externally validated. In addition, we included SHapley Additive exPlanation (SHAP) values to offer interpretable global and local prediction models. Results: All models showed a good performance in development and external validation. However, the 6-months XGBoost prediction model showed the best performance in internal (MAE average= 6.07; RSME= 78.87), and in external validation (MAE average= 6.45, RSME= 18.94). The top 3 most influential features that pushed the predicted eGFR value to lower values were the interpolated values for eGFR and creatinine, and eGFR at baseline. Conclusion: In the current study we have developed and validated machine learning models to predict the next eGFR value at different intervals. Furthermore, we attempted to approach the need for prediction explanation by offering transparent predictions.
What problem does this paper attempt to address?