Challenges in the application of a mortality prediction model for COVID-19 patients on an Indian cohort

Yukti Makhija,Samarth Bhatia,Shalendra Singh,Sneha Kumar Jayaswal,Prabhat Singh Malik,Pallavi Gupta,Shreyas N. Samaga,Shreya Johri,Sri Krishna Venigalla,Rabi Narayan Hota,Surinder Singh Bhatia,Ishaan Gupta
DOI: https://doi.org/10.48550/arXiv.2101.07215
2021-01-15
Abstract:Many countries are now experiencing the third wave of the COVID-19 pandemic straining the healthcare resources with an acute shortage of hospital beds and ventilators for the critically ill patients. This situation is especially worse in India with the second largest load of COVID-19 cases and a relatively resource-scarce medical infrastructure. Therefore, it becomes essential to triage the patients based on the severity of their disease and devote resources towards critically ill patients. Yan et al. 1 have published a very pertinent research that uses Machine learning (ML) methods to predict the outcome of COVID-19 patients based on their clinical parameters at the day of admission. They used the XGBoost algorithm, a type of ensemble model, to build the mortality prediction model. The final classifier is built through the sequential addition of multiple weak classifiers. The clinically operable decision rule was obtained from a 'single-tree XGBoost' and used lactic dehydrogenase (LDH), lymphocyte and high-sensitivity C-reactive protein (hs-CRP) values. This decision tree achieved a 100% survival prediction and 81% mortality prediction. However, these models have several technical challenges and do not provide an out of the box solution that can be deployed for other populations as has been reported in the "Matters Arising" section of Yan et al. Here, we show the limitations of this model by deploying it on one of the largest datasets of COVID-19 patients containing detailed clinical parameters collected from India.
Machine Learning
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is to evaluate the applicability and accuracy of an existing COVID - 19 patient mortality prediction model in the Indian population. Specifically, the authors are concerned with how the machine - learning - based mortality prediction model proposed by Yan et al. performs on the Indian patient data set. This model was initially trained using the clinical parameters of Chinese patients, and this study attempts to verify whether the model can be directly applied to patient groups in different geographical and demographic contexts, especially in the relatively resource - scarce Indian medical environment. The main challenges mentioned in the paper include: 1. **Limitations of data collection**: Many patients did not undergo all necessary clinical tests, resulting in a lack of complete clinical parameters, which is a significant obstacle to deploying machine - learning models in resource - limited environments. 2. **Differences in laboratory test standards**: Laboratory test standards used in different countries and regions may vary. For example, the measurement methods and reference ranges of lactate dehydrogenase (LDH) may be different in different regions, which requires appropriate standardization of the data. 3. **Differences in population genetics**: Genetic differences between different populations may lead to significant differences in the values of certain biochemical parameters, and these differences may affect the prediction performance of the model. By analyzing the data of Indian patients, the authors found that the model of Yan et al. performed well in predicting mortality, but poorly in predicting survival rate and infection severity. This result emphasizes the need to consider biases in multiple aspects such as technical variability, population genetics, demography, and socioeconomic factors when developing machine - learning models for predicting patient outcomes.