A Case Study with Application to COVID-19 Pandemics

Emerson Vilar de Oliveira,Dunfrey Pires Aragão,Luiz Marcos Garcia Gonçalves
DOI: https://doi.org/10.3390/ijerph21040497
IF: 4.614
2024-04-19
International Journal of Environmental Research and Public Health
Abstract:The SARS-CoV-2 global pandemic prompted governments, institutions, and researchers to investigate its impact, developing strategies based on general indicators to make the most precise predictions possible. Approaches based on epidemiological models were used but the outcomes demonstrated forecasting with uncertainty due to insufficient or missing data. Besides the lack of data, machine-learning models including random forest, support vector regression, LSTM, Auto-encoders, and traditional time-series models such as Prophet and ARIMA were employed in the task, achieving remarkable results with limited effectiveness. Some of these methodologies have precision constraints in dealing with multi-variable inputs, which are important for problems like pandemics that require short and long-term forecasting. Given the under-supply in this scenario, we propose a novel approach for time-series prediction based on stacking auto-encoder structures using three variations of the same model for the training step and weight adjustment to evaluate its forecasting performance. We conducted comparison experiments with previously published data on COVID-19 cases, deaths, temperature, humidity, and air quality index (AQI) in São Paulo City, Brazil. Additionally, we used the percentage of COVID-19 cases from the top ten affected countries worldwide until May 4th, 2020. The results show 80.7% and 10.3% decrease in RMSE to entire and test data over the distribution of 50 trial-trained models, respectively, compared to the first experiment comparison. Also, model type#3 achieved 4th better overall ranking performance, overcoming the NBEATS, Prophet, and Glounts time-series models in the second experiment comparison. This model shows promising forecast capacity and versatility across different input dataset lengths, making it a prominent forecasting model for time-series tasks.
public, environmental & occupational health,environmental sciences
What problem does this paper attempt to address?
The paper mainly aims to address the following issues: ### Research Background and Objectives - **Research Background**: The global COVID-19 pandemic has prompted governments, institutions, and researchers to invest significant effort in studying its impact and developing strategies based on general indicators to make the most accurate predictions possible. Traditional epidemiological models face uncertainty when dealing with incomplete or missing data. - **Research Objectives**: Propose a new Auto-Regressive Multi-Variable Modified Auto-Encoder for multivariate time series prediction, specifically targeting COVID-19 pandemic predictions. ### Specific Issues - **Problem Definition**: Existing methods, including Random Forest, Support Vector Regression, Long Short-Term Memory networks (LSTM), Auto-Encoders, and traditional time series models (such as Prophet and ARIMA), have achieved certain results but face accuracy limitations when handling multivariate inputs. This is particularly important for problems requiring both short-term and long-term predictions, such as pandemics. - **Solution**: The paper proposes a new method based on a stacked auto-encoder structure. This method involves training steps and weight adjustments through three variant models, aiming to improve prediction performance. ### Experimental Design - **Experimental Data**: The experiments used data from São Paulo, Brazil, including COVID-19 case numbers, death counts, temperature, humidity, and Air Quality Index (AQI). Additionally, data on the proportion of COVID-19 cases from the 10 most affected countries globally before May 4, 2020, were used. - **Evaluation Metrics**: Model performance was evaluated by comparing the Root Mean Square Error (RMSE) on the training and test sets across different models. The results showed that the proposed model had a better overall ranking performance compared to other models such as NBEATS, Prophet, and Glounts. ### Conclusion - **Main Contribution**: The proposed method demonstrated the ability and flexibility to predict different input data lengths, providing an outstanding prediction model for time series tasks. It is particularly suitable for the public health field, which requires high-accuracy predictions, such as COVID-19 pandemic predictions.