A novel approach for predicting epidemiological forecasting parameters based on real-time signals and Data Assimilation

Romain Molinas,César Quilodrán Casas,Rossella Arcucci,Ovidiu Şerban
2023-07-04
Abstract:This paper proposes a novel approach to predict epidemiological parameters by integrating new real-time signals from various sources of information, such as novel social media-based population density maps and Air Quality data. We implement an ensemble of Convolutional Neural Networks (CNN) models using various data sources and fusion methodology to build robust predictions and simulate several dynamic parameters that could improve the decision-making process for policymakers. Additionally, we used data assimilation to estimate the state of our system from fused CNN predictions. The combination of meteorological signals and social media-based population density maps improved the performance and flexibility of our prediction of the COVID-19 outbreak in London. While the proposed approach outperforms standard models, such as compartmental models traditionally used in disease forecasting (SEIR), generating robust and consistent predictions allows us to increase the stability of our model while increasing its accuracy.
Machine Learning,Neural and Evolutionary Computing,Social and Information Networks
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to propose a new method to predict epidemiological parameters by integrating multiple real - time signals and data assimilation (DA) techniques. Specifically, the goals of the paper are: 1. **Develop a novel model**: To predict key parameters of infectious diseases such as COVID - 19, such as the number of infected people and the number of deaths. 2. **Fuse multi - source data**: Use convolutional neural networks (CNNs) to extract features from data from different sources and generate robust predictions through a fusion architecture. These data sources include: - **Social media data**: Used as a real - time signal of population behavior, especially in high - density areas. - **Meteorological and air quality data**: Used to characterize the epidemic spread in the London area. 3. **Improve the flexibility and robustness of prediction**: Through ensemble learning and data assimilation techniques, reduce the uncertainties brought by different data sources and enhance the stability and accuracy of the model. 4. **Improve decision support**: Provide more accurate and timely epidemiological predictions for policy - makers to help them make better decisions. ### Method overview To achieve the above goals, the paper proposes the following key techniques: - **CNN fusion architecture**: By combining multiple CNN models, each model focuses on different types of data streams (such as time - series data and spatial data), and performs feature fusion through element - wise multiplication. The formula is as follows: \[ c_k=\left(\sum_{i = 1}^{M}\alpha_k^i a_i+\gamma_k\right)\odot\left(\sum_{j = 1}^{N}\beta_k^j b_j+\delta_k\right) \] where \(a_i\) and \(b_j\) are the feature maps of the last convolutional layer from the time and space networks respectively, \(\alpha\) and \(\beta\) are learnable weights, and \(\gamma\) and \(\delta\) are bias terms. - **Data Assimilation**: Use the Stochastic Ensemble Kalman Filter (EnKF) to combine the observed data with the numerical model output to obtain the best estimate of the system state. This helps to deal with the noise and uncertainty in real - time signals. Observation model: \[ y_k = H_k(x_k)+\epsilon_y^k \] State evolution model: \[ x_{k + 1}=G_k(x_k)+\epsilon_x^k \] where \(\epsilon_y^k\sim N(0, R_k)\) and \(\epsilon_x^k\sim N(0, P_k)\) represent the observation error and the model error respectively, both assumed to be Gaussian distributions. Through these methods, the paper attempts to overcome the limitations of traditional epidemiological models (such as the SEIR model) in data acquisition and processing, and provide a more flexible and accurate prediction tool.