Two machine learning-derived nomogram for predicting the occurrence and severity of acute graft-versus-host disease: a retrospective study based on serum biomarkers

Qiang He,Xin Li,Yuan Fang,Fansheng Kong,Zhe Yu,Linna Xie
DOI: https://doi.org/10.3389/fgene.2024.1421980
IF: 3.7
2024-11-09
Frontiers in Genetics
Abstract:Background: Acute graft-versus-host disease (aGVHD) is a common complication after allogeneic hematopoietic cell transplantation (allo-HSCT), with high morbidity and mortality. Although glucocorticoids are the standard treatment, only half of patients achieve complete remission. Thus, there is an urgent need to screen biomarkers for the diagnosis of aGVHD to assist in the identification of individuals at risk of aGVHD. This study was to construct prediction models for the occurrence and severity of aGVHD using two machine learning algorithms based on serum biochemical data. Methods: Clinical data of 120 patients with hematological diseases who received allo-HSCT were retrospectively analyzed. Seventy-six patients developed aGVHD, including 56 grade I/II and 20 grade III/IV. First, 15 serum biochemical indicators were considered as potential risk factors, and the differences in the levels of indicators between non-aGVHD and aGVHD were observed, followed by evaluation of the diagnostic property. Subsequently, to develop the prediction models for the occurrence and severity of aGVHD, LASSO and random forest (RF) analyses were performed with experimental indicators. Finally, Venn diagram analysis was utilized to obtain shared biomarkers in the two algorithms to construct the nomogram. The model performance was measured by calibration curves. Internal and external validations were performed based on risk score models and ROC curve analyses. Results: Total 12 of 15 indicators exhibited significant differences between the aGVHD and non-aGVHD groups, with AUC values > 0.75. In machine learning analysis, eight features (LAG-3, TLR-2, PD-L1, IP-10, elafin, REG-3α, ST2, TIM3) and seven variables (LAG-3, TLR-2, PD-1, Flt_3, IL-9, elafin, TIM3) were selected to distinguish aGVHD vs. non-aGVHD as well as grade I/II vs. III/IV, respectively. Further, the corresponding nomogram models were established and calibration curves showed that prediction was in good agreement with the actual probability. Biomarker-based risk score model was constructed, which obtained AUC value >0.89 in internal and external datasets. Conclusion: Clinical variables screened through learning algorithm can predict the risk and severity of aGVHD. Our findings may help clinicians develop more personalized and reasonable management strategies.
genetics & heredity
What problem does this paper attempt to address?