Early detection of disease outbreaks and non-outbreaks using incidence data

Shan Gao,Amit K. Chakraborty,Russell Greiner,Mark A. Lewis,Hao Wang
2024-04-13
Abstract:Forecasting the occurrence and absence of novel disease outbreaks is essential for disease management. Here, we develop a general model, with no real-world training data, that accurately forecasts outbreaks and non-outbreaks. We propose a novel framework, using a feature-based time series classification method to forecast outbreaks and non-outbreaks. We tested our methods on synthetic data from a Susceptible-Infected-Recovered model for slowly changing, noisy disease dynamics. Outbreak sequences give a transcritical bifurcation within a specified future time window, whereas non-outbreak (null bifurcation) sequences do not. We identified incipient differences in time series of infectives leading to future outbreaks and non-outbreaks. These differences are reflected in 22 statistical features and 5 early warning signal indicators. Classifier performance, given by the area under the receiver-operating curve, ranged from 0.99 for large expanding windows of training data to 0.7 for small rolling windows. Real-world performances of classifiers were tested on two empirical datasets, COVID-19 data from Singapore and SARS data from Hong Kong, with two classifiers exhibiting high accuracy. In summary, we showed that there are statistical features that distinguish outbreak and non-outbreak sequences long before outbreaks occur. We could detect these differences in synthetic and real-world data sets, well before potential outbreaks occur.
Machine Learning,Dynamical Systems,Populations and Evolution,Applications
What problem does this paper attempt to address?
The problem that this paper attempts to solve is **how to use incidence data to predict disease outbreaks and non - outbreak events early**. Specifically, the researchers have developed a general model aimed at accurately predicting the outbreaks and non - outbreaks of newly - emerging diseases. The importance of this problem lies in: 1. **Requirements for public health management**: Predicting disease outbreaks in advance can help public health departments take preventive measures in a timely manner and reduce the risk of disease transmission. 2. **Optimization of resource allocation**: Through accurate prediction, medical resources can be allocated reasonably and unnecessary waste can be avoided. 3. **Response to public health emergencies**: For new infectious diseases, due to the lack of historical data, traditional mathematical modeling methods may not be effective enough. Therefore, a new method that does not depend on real - world training data is needed. ### Research background The occurrence and recurrence of infectious diseases are common worldwide. Some of them have a high mortality rate or are highly contagious, while others may be non - fatal and disappear quickly. Balancing the risks brought by new infectious diseases with the costs and consequences of preventive measures has always been a long - term challenge. Especially when facing new diseases such as COVID - 19, the initial response often lacks sufficient guidance and certainty. ### Solution To address the above problems, the researchers proposed a feature - based time - series classification framework, trained with synthetic data and verified on real - world data. The following are the main contributions of this study: 1. **Proposed a new framework**: This framework uses a feature - based time - series classification method to predict disease outbreak and non - outbreak events. 2. **Trained with synthetic data**: By introducing different types of noise (white noise, multiplicative environmental noise, demographic noise) into the Susceptible - Infected - Recovered (SIR) model, a large amount of simulated data was generated. 3. **Extracted key features**: 22 statistical features and 5 early - warning signal indicators were extracted from the time series to distinguish between outbreak and non - outbreak sequences. 4. **Evaluated classifier performance**: The performance of the classifier was evaluated by calculating the area under the receiver operating characteristic curve (AUC), and the influence of different - length time windows on the prediction accuracy was tested. ### Experimental results The researchers trained 32 classifiers and tested them on multiple datasets, including synthetic data and real - world COVID - 19 and SARS data. The experimental results show that the classifiers exhibit high accuracy in predicting disease outbreaks, especially when approaching the critical point. In addition, the study also explored the performance differences between different feature extraction methods and machine - learning algorithms. ### Conclusion The study shows that by analyzing specific statistical features in the time series, potential outbreak and non - outbreak events can be identified relatively early before the disease outbreak. This method is not only applicable to synthetic data but also has been verified in real - world data, providing strong support for future disease prevention and control.