Deep learning-based approach for COVID-19 spread prediction
Silvino Pedro Cumbane,Győző Gidófalvi
DOI: https://doi.org/10.1007/s41060-024-00558-1
2024-06-10
International Journal of Data Science and Analytics
Abstract:Abstract Spread prediction models are vital tools to help health authorities and governments fight against infectious diseases such as COVID-19. The availability of historical daily COVID-19 cases, in conjunction with other datasets such as temperature and humidity (which are believed to play a key role in the spread of the disease), has opened a window for researchers to investigate the potential of different techniques to model and thereby expand our understanding of the factors (e.g., interaction or exposure resulting from mobility) that govern the underlying dynamics of the spread. Traditionally, infectious diseases are modeled using compartmental models such as the SIR model. However, this model shortcoming is that it does not account for mobility, and the resulting mixing or interactions, which we conjecture are a key factor in the dynamics of the spread. Statistical analysis and deep learning-based approaches such as autoregressive integrated moving average (ARIMA), gated recurrent units, variational autoencoder, long short-term memory (LSTM), convolution LSTM, stacked LSTM, and bidirectional LSTM have been tested with COVID-19 historical data to predict the disease spread mainly in medium- and high-income countries with good COVID-19 testing capabilities. However, few studies have focused on low-income countries with low access to COVID-19 testing and, hence, highly biased historical datasets. In addition to this, the arguable best model (BiLSTM) has not been tested with an arguably good set of features (people mobility data, temperature, and relative humidity). Therefore, in this study, the multi-layer BiLSTM model is tested with mobility trend data from Google, temperature, and relative humidity to predict daily COVID-19 cases in low-income countries. The performance of the proposed multi-layer BiLSTM is evaluated by comparing its RMSE with the one from multi-layer LSTM (with the same settings as BiLSTM) in four developing countries namely Mozambique, Rwanda, Nepal, and Myanmar. The proposed multi-layer BiLSTM outperformed the multi-layer LSTM in all four countries. The proposed multi-layer BiLSTM was also evaluated by comparing its root mean-squared error (RMSE) with multi-layer LSTM models, ARIMA- and stacked LSTM-based models in eight countries, namely Italy, Turkey, Australia, Brazil, Canada, Egypt, Japan, and the UK. Finally, the proposed multi-layer BiLSTM model was evaluated at the city level by comparing its average relative error with the other four models, namely the LSTM-based model considering multi-layer architecture, Google Cloud Forecasting, the LSTM-based model with mobility data only, and the LSTM-based model with mobility, temperature, and relative humidity data for 7 periods (of 28 days each) in six highly populated regions in Japan, namely Tokyo, Aichi, Osaka, Hyogo, Kyoto, and Fukuoka. The proposed multi-layer BiLSTM model outperformed the multi-layer LSTM model and other previous models by up to 1.6 and 0.6 times in terms of RMSE and ARE, respectively. Therefore, the proposed model enables more accurate forecasting of COVID-19 cases and can support governments and health authorities in their decisions, mainly in developing countries with limited resources.