A Case Study with Application to COVID-19 Pandemics

Emerson Vilar de Oliveira,Dunfrey Pires Aragão,Luiz Marcos Garcia Gonçalves

DOI: https://doi.org/10.3390/ijerph21040497

IF: 4.614

2024-04-19

International Journal of Environmental Research and Public Health

Abstract:The SARS-CoV-2 global pandemic prompted governments, institutions, and researchers to investigate its impact, developing strategies based on general indicators to make the most precise predictions possible. Approaches based on epidemiological models were used but the outcomes demonstrated forecasting with uncertainty due to insufficient or missing data. Besides the lack of data, machine-learning models including random forest, support vector regression, LSTM, Auto-encoders, and traditional time-series models such as Prophet and ARIMA were employed in the task, achieving remarkable results with limited effectiveness. Some of these methodologies have precision constraints in dealing with multi-variable inputs, which are important for problems like pandemics that require short and long-term forecasting. Given the under-supply in this scenario, we propose a novel approach for time-series prediction based on stacking auto-encoder structures using three variations of the same model for the training step and weight adjustment to evaluate its forecasting performance. We conducted comparison experiments with previously published data on COVID-19 cases, deaths, temperature, humidity, and air quality index (AQI) in São Paulo City, Brazil. Additionally, we used the percentage of COVID-19 cases from the top ten affected countries worldwide until May 4th, 2020. The results show 80.7% and 10.3% decrease in RMSE to entire and test data over the distribution of 50 trial-trained models, respectively, compared to the first experiment comparison. Also, model type#3 achieved 4th better overall ranking performance, overcoming the NBEATS, Prophet, and Glounts time-series models in the second experiment comparison. This model shows promising forecast capacity and versatility across different input dataset lengths, making it a prominent forecasting model for time-series tasks.

public, environmental & occupational health,environmental sciences

What problem does this paper attempt to address?

The paper mainly aims to address the following issues: ### Research Background and Objectives - **Research Background**: The global COVID-19 pandemic has prompted governments, institutions, and researchers to invest significant effort in studying its impact and developing strategies based on general indicators to make the most accurate predictions possible. Traditional epidemiological models face uncertainty when dealing with incomplete or missing data. - **Research Objectives**: Propose a new Auto-Regressive Multi-Variable Modified Auto-Encoder for multivariate time series prediction, specifically targeting COVID-19 pandemic predictions. ### Specific Issues - **Problem Definition**: Existing methods, including Random Forest, Support Vector Regression, Long Short-Term Memory networks (LSTM), Auto-Encoders, and traditional time series models (such as Prophet and ARIMA), have achieved certain results but face accuracy limitations when handling multivariate inputs. This is particularly important for problems requiring both short-term and long-term predictions, such as pandemics. - **Solution**: The paper proposes a new method based on a stacked auto-encoder structure. This method involves training steps and weight adjustments through three variant models, aiming to improve prediction performance. ### Experimental Design - **Experimental Data**: The experiments used data from São Paulo, Brazil, including COVID-19 case numbers, death counts, temperature, humidity, and Air Quality Index (AQI). Additionally, data on the proportion of COVID-19 cases from the 10 most affected countries globally before May 4, 2020, were used. - **Evaluation Metrics**: Model performance was evaluated by comparing the Root Mean Square Error (RMSE) on the training and test sets across different models. The results showed that the proposed model had a better overall ranking performance compared to other models such as NBEATS, Prophet, and Glounts. ### Conclusion - **Main Contribution**: The proposed method demonstrated the ability and flexibility to predict different input data lengths, providing an outstanding prediction model for time series tasks. It is particularly suitable for the public health field, which requires high-accuracy predictions, such as COVID-19 pandemic predictions.

A Case Study with Application to COVID-19 Pandemics

Short-term forecasting COVID-19 cumulative confirmed cases: Perspectives for Brazil

Prediction of COVID-19 Data Using Improved ARIMA-LSTM Hybrid Forecast Models

COVID-19 Pandemic Prediction using Time Series Forecasting Models

Predicting COVID-19 cases in various scenarios using RNN-LSTM models aided by adaptive linear regression to identify data anomalies

Global Short-Term Forecasting of Covid-19 Cases

Comparison of Traditional and Hybrid Time Series Models for Forecasting COVID-19 Cases

Backtesting the predictability of COVID-19

Prediction and analysis of COVID-19 daily new cases and cumulative cases: times series forecasting and machine learning models

Time Series Analysis and Forecasting of COVID-19 Cases Using LSTM and ARIMA Models

Forecasting COVID-19 Pandemic Using Prophet, ARIMA, and Hybrid Stacked LSTM-GRU Models in India

Advanced forecasting of COVID-19 epidemic: Leveraging ensemble models, advanced optimization, and decomposition techniques

Deep learning-based approach for COVID-19 spread prediction

A Study of Data-driven Methods for Adaptive Forecasting of COVID-19 Cases

Forecasting COVID‐19 cases using dynamic time warping and incremental machine learning methods

Enhancing COVID-19 Case Forecasting in the United States: A Comparative Analysis of ARIMA, SARIMA, and RNN Models with Grid Search Optimization

Application of machine learning time series analysis for prediction COVID-19 pandemic

Analysis of learning curves in predictive modeling using exponential curve fitting with an asymptotic approach

Novel cost-effective method for forecasting COVID-19 and hospital occupancy using deep learning

A computational tool for trend analysis and forecast of the COVID-19 pandemic

Meteorological and human mobility data on predicting COVID-19 cases by a novel hybrid decomposition method with anomaly detection analysis: a case study in the capitals of Brazil