Use Internet Search Data to Accurately Track State-Level Influenza Epidemics

Shihao Yang,Shaoyang Ning,S. C. Kou
DOI: https://doi.org/10.48550/arXiv.2006.02927
2020-12-24
Abstract:For epidemics control and prevention, timely insights of potential hot spots are invaluable. Alternative to traditional epidemic surveillance, which often lags behind real time by weeks, big data from the Internet provide important information of the current epidemic trends. Here we present a methodology, ARGOX (Augmented Regression with GOogle data CROSS space), for accurate real-time tracking of state-level influenza epidemics in the United States. ARGOX combines Internet search data at the national, regional and state levels with traditional influenza surveillance data from the Centers for Disease Control and Prevention, and accounts for both the spatial correlation structure of state-level influenza activities and the evolution of people's Internet search pattern. ARGOX achieves on average 28\% error reduction over the best alternative for real-time state-level influenza estimation for 2014 to 2020. ARGOX is robust and reliable and can be potentially applied to track county- and city-level influenza activity and other infectious diseases.
Applications
What problem does this paper attempt to address?
This paper aims to solve the problem of real - time tracking of influenza epidemics at the state level in the United States. Traditional influenza surveillance systems, such as the surveillance network of the Centers for Disease Control and Prevention (CDC) in the United States, are usually weeks behind the actual time, which is far from meeting the needs of public health decision - making, especially in the face of epidemic outbreaks or pandemics. Therefore, the paper proposes a new statistical model - ARGOX (Augmented Regression with GOogle data CROSS space) to combine Internet search data and traditional influenza surveillance data to achieve accurate real - time tracking of influenza activities in each state of the United States. ### Main problems solved by the paper: 1. **Real - time performance**: Traditional influenza surveillance methods have significant time delays, while ARGOX provides near - real - time estimates of influenza activities by using big data on the Internet, especially search engine data. 2. **Accuracy**: ARGOX not only improves the accuracy of influenza activity estimates, but also provides a more reliable data integration method at multiple geographical levels (national, regional, state). 3. **Multi - resolution data fusion**: ARGOX effectively combines public data at different geographical levels, including national - level, regional - level and state - level Internet search data and traditional influenza surveillance data, thereby improving the robustness and applicability of the model. 4. **Spatial correlation**: ARGOX takes into account the spatial correlation of interstate influenza activities and the evolution of people's Internet search patterns, further improving the accuracy of prediction. 5. **Model flexibility**: The ARGOX framework is highly flexible and can easily incorporate information from other sources and resolutions, and is suitable for tracking other social, economic or public health events. ### Key points: - **Data sources**: ARGOX combines Internet search data and traditional influenza surveillance data from the CDC. - **Model design**: ARGOX is divided into two steps: the first step is to extract Internet search information, and the second step is to enhance the estimates through information integration across time and geographical levels. - **Performance improvement**: In the tests from 2014 to 2020, ARGOX reduced the mean square error (MSE) by an average of 28% compared to the best alternative method, and outperformed the benchmark method in all states. - **Application prospects**: ARGOX is not only suitable for influenza tracking, but can also be extended to real - time monitoring of other diseases and socio - economic events. Through these improvements, ARGOX provides more timely and accurate influenza activity information for public health officials, which is helpful for better resource allocation and epidemic prevention and control.