Modeling Epidemic Spread: A Gaussian Process Regression Approach

Baike She,Lei Xin,Philip E. Paré,Matthew Hale
2024-09-17
Abstract:Modeling epidemic spread is critical for informing policy decisions aimed at mitigation. Accordingly, in this work we present a new data-driven method based on Gaussian process regression (GPR) to model epidemic spread. We bound the variance of the predictions made by GPR, which quantifies the impact of epidemic data on the proposed model. Next, we derive a high-probability error bound on the prediction error in terms of the distance between the training points and a testing point, the posterior variance, and the level of change in the spreading process, and we assess how the characteristics of the epidemic spread and infection data influence this error bound. We present examples that use GPR to model and predict epidemic spread by using real-world infection data gathered in the UK during the COVID-19 epidemic. These examples illustrate that, under typical conditions, the prediction for the next twenty days has 94.29% of the noisy data located within the 95% confidence interval, validating these predictions.
Machine Learning,Systems and Control,Physics and Society
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is to model and predict the spread trend of infectious diseases through the Gaussian Process Regression (GPR) method. Specifically, the paper focuses on the following aspects: 1. **Develop a new model**: Use GPR to model the spread trend of infectious diseases and verify the effectiveness of this method theoretically and numerically. 2. **Quantify the influence of noise and data volume**: Analyze the influence of noise and the amount of data samples on the prediction results. 3. **Establish a high - probability error bound**: Derive a high - probability prediction error bound to analyze the influence of data on the prediction error and verify this result on real - data. ### Specific problem description 1. **Problem 1**: Develop a new model, use GPR to model the spread of infectious diseases, and demonstrate the effectiveness of this method. - **Solution**: The paper proposes a new method based on GPR, which directly studies the change in the number of infection cases without using a specific compartment model (such as the SIR model). The effectiveness of this method has been verified through actual COVID - 19 infection data. 2. **Problem 2**: Quantify the influence of noise and data sample volume on the prediction results. - **Solution**: The paper derives an upper bound of the posterior variance and evaluates the influence of the infection data on the model. This helps to understand how the quality and quantity of data affect the accuracy of prediction. 3. **Problem 3**: Establish a high - probability prediction error bound, analyze the influence of data on the prediction error, and verify this result on real - data. - **Solution**: The paper derives a high - probability prediction error bound, taking into account the distance between training points and test points, the posterior variance, and the degree of change in the spread process. The rationality of this error bound has been verified through actual COVID - 19 data. ### Main contributions - **Model innovation**: Propose a new method of directly modeling the change in the number of infection cases using GPR. - **Theoretical analysis**: Provide an upper bound of the posterior variance and a high - probability prediction error bound, quantifying the influence of data quality and quantity on the prediction results. - **Empirical verification**: Conduct empirical analysis using COVID - 19 infection data in the UK, verifying the effectiveness of the model and the accuracy of prediction. Through these works, the paper provides a new data - driven method for the modeling and prediction of infectious disease spread and provides a theoretical basis for the design of future epidemic control strategies.