Abstract:Estimation of influenza-like illness (ILI) using search trends activity was intended to supplement traditional surveillance systems, and was a motivation behind the development of Google Flu Trends (GFT). However, several studies have previously reported large errors in GFT estimates of ILI in the US. Following recent release of time-stamped surveillance data, which better reflects real-time operational scenarios, we reanalyzed GFT errors. Using three data sources—GFT: an archive of weekly ILI estimates from Google Flu Trends; ILIf: fully-observed ILI rates from ILINet; and, ILIp: ILI rates available in real-time based on partial reporting—five influenza seasons were analyzed and mean square errors (MSE) of GFT and ILIp as estimates of ILIf were computed. To correct GFT errors, a random forest regression model was built with ILI and GFT rates from the previous three weeks as predictors. An overall reduction in error of 44% was observed and the errors of the corrected GFT are lower than those of ILIp. An 80% reduction in error during 2012/13, when GFT had large errors, shows that extreme failures of GFT could have been avoided. Using autoregressive integrated moving average (ARIMA) models, one- to four-week ahead forecasts were generated with two separate data streams: ILIp alone, and with both ILIp and corrected GFT. At all forecast targets and seasons, and for all but two regions, inclusion of GFT lowered MSE. Results from two alternative error measures, mean absolute error and mean absolute proportional error, were largely consistent with results from MSE. Taken together these findings provide an error profile of GFT in the US, establish strong evidence for the adoption of search trends based 'nowcasts' in influenza forecast systems, and encourage reevaluation of the utility of this data source in diverse domains.Google Flu Trends (GFT) was proposed as a method to estimate influenza-like illness (ILI) in the general population and to be used in conjunction with traditional surveillance systems. Several previous studies have documented that GFT estimates were often overestimates of ILI. In this study, using a recently released archive of data of provisional incidence from a large surveillance system in the US (ILINet), we report errors in GFT alongside errors from ILINet's initial estimates of ILI. This comparison using information available in real-time allows for a more nuanced assessment of GFT errors. Additionally, we describe a method to correct errors in GFT and show that the corrected GFT estimates are at least as accurate as initial estimates from ILINet. Finally, we show that inclusion of corrected GFT while forecasting ILI in the next four weeks considerably improves forecast accuracy. Taken together, our results indicate that the GFT model could have added value to traditional surveillance and forecasting systems, and a reevaluation of the utility of the underlying search trends data, which is now more openly accessible, in fields beyond influenza is warranted.

Reappraising the utility of Google Flu Trends

Reassessing Google Flu Trends Data for Detection of Seasonal and Pandemic Influenza: A Comparative Epidemiological Study at Three Geographic Scales

Improving Google Flu Trends Estimates for the United States through Transformation

Assessing Google Flu Trends Performance in the United States during the 2009 Influenza Virus A (H1N1) Pandemic

Google Flu Trends Spatial Variability Validated Against Emergency Department Influenza-Related Visits

Influenza Forecasting with Google Flu Trends

Improved forecasts of influenza-associated hospitalization rates with Google Search Trends

Evaluating Google Flu Trends in Latin America: Important Lessons for the Next Phase of Digital Disease Detection

Monitoring Influenza Activity in the United States: A Comparison of Traditional Surveillance Systems with Google Flu Trends

Improved Real-Time Influenza Surveillance: Using Internet Search Data in Eight Latin American Countries

[Google Flu Trends--the Initial Application of Big Data in Public Health].

Using web search queries to monitor influenza-like illness: an exploratory retrospective analysis, Netherlands, 2017/18 influenza season

Detecting influenza epidemics using search engine query data

Combining Search, Social Media, and Traditional Data Sources to Improve Influenza Surveillance

Utilizing Google Trends' Big Data for Epidemic Surveillance

PREPRINT: Using digital epidemiology methods to monitor influenza-like illness in the Netherlands in real-time: the 2017-2018 season

Using Google Trends and ambient temperature to predict seasonal influenza outbreaks

Age-Related Differences in the Accuracy of Web Query-Based Predictions of Influenza-Like Illness

Forecasting influenza-like illness trends in Cameroon using Google Search Data

Forecasting dengue and influenza incidences using a sparse representation of Google trends, electronic health records, and time series data

Using Transactional Big Data for Epidemiological Surveillance: Google Flu Trends and Ethical Implications of ‘Infodemiology’