A Machine Learning approach to identify groups of patients with hematological malignant disorders

Pablo Rodríguez-Belenguer,José Luis Piñana,Manuel Sánchez-Montañés,Emilio Soria-Olivas,Marcelino Martínez-Sober,Antonio J. Serrano-López
DOI: https://doi.org/10.1016/j.cmpb.2024.108011
IF: 6.1
2024-01-11
Computer Methods and Programs in Biomedicine
Abstract:Background and Objective The study addresses the need for strong vaccine-induced antibodies against SARS-CoV-2 in immunocompromised hematological malignancy (HM) patients to reduce COVID-19 severity. Despite vaccination efforts, over a third of HM patients remain unresponsive, increasing their risk of severe breakthrough infections. The study aims to leverage machine learning's adaptability to COVID-19 dynamics, efficiently selecting patient-specific features to enhance predictions and improve healthcare strategies. Emphasizing the complex COVID-hematology connection, the focus is on interpretable machine learning to provide valuable insights to clinicians and biologists. Methods The study evaluated a dataset with more than 1600 patients with hematological diseases. The output was the achievement or non-achievement of a serological response after full COVID-19 vaccination. Various machine learning methods were applied, with the best model selected based on metrics like Area Under the Curve (AUC) score, Sensitivity, Specificity, and Matthew Correlation Coefficient (MCC). Individual SHAP values were obtained for the best model, and principal component analysis (PCA) was applied to these values. The patient profiles were then analyzed within identified clusters. Results Support vector machine (SVM) emerged as the best-performing model. PCA applied to SVM-derived SHAP values resulted in four perfectly separated clusters. These clusters, ordered by the probability of generating antibodies. The clusters were characterized by their respective probabilities. Cluster 1, with the second-highest probability (69.91%), included patients with aggressive diseases and factors contributing to increased immunodeficiency. Cluster 2 had the lowest likelihood (33.3%), but the small sample size limited conclusive findings. Cluster 3, representing the majority of the population, exhibited a high rate of antibody generation (84.39%) and a better prognosis compared to Cluster 1. Cluster 4, with a probability of 66.33%, included patients with B-cell non-Hodgkin's lymphoma on corticosteroid therapy. Conclusions The methodology successfully identified four separate clusters of HM patients based on their likelihood of generating antibodies after COVID-19 vaccination. The study suggests the methodology's potential applicability to other diseases, highlighting the importance of interpretable ML in healthcare research and decision-making.
engineering, biomedical,computer science, interdisciplinary applications,medical informatics, theory & methods
What problem does this paper attempt to address?