Prediction of Influenza Epidemics at the Province Level in China Using Search Query from "Haosou".

Yuanqiang Zou,Yousong Peng,Li Lu,Taijiao Jiang,Lizong Deng
DOI: https://doi.org/10.1109/fskd.2015.7382157
2015-01-01
Abstract:Influenza (Flu) has caused and will continue to cause large morbidity and mortality to human society. Surveillance is critical for prevention and control of it. Traditional methods for influenza surveillance are based on hierarchical surveillance networks, which are costly and time-consuming. The internet-based method is a useful supplement to traditional methods in influenza surveillance. Here, we attempted to predict the daily influenza epidemics, represented as the number of influenza-like illness (ILI) cases, in three representative provinces in China (Beijing, Shanghai and Guangdong) based on the "Haosou" search engine. The search queries for several keywords related to influenza were observed to have medium to weak correlations to influenza epidemics in these provinces. Two kinds of statistical model, the multiple linear regression and regression tree, were built to predict the influenza epidemics based on the search queries of these keywords. Both of them achieved moderate performance in the cross-validation and retrospective-testing for each province, with the Pearson Correlation Coefficients between the predicted and observed influenza epidemics ranging from 0.35 to 0.78. Further, the unified models were built for prediction of influenza epidemics for all three provinces. Their poor performances suggest the difficulty of building a unified model for prediction of influenza epidemics for all provinces in China. Overall, this work shows the potential of predicting the daily influenza epidemics at the province level in China based on the search engine data. More efforts are needed to improve the statistical models for their use in practical influenza surveillance.
What problem does this paper attempt to address?