“Viral” Spread of COVID-19: The Digital World Reflects and Precedes Reality (Preprint)
Karan Gill,Giovanni Cacciamani,Paolo Dell’Oglio,Andrea Cocci,Giorgio Russo,Inderbir Gill
DOI: https://doi.org/10.2196/preprints.22378
2020-07-09
Abstract:BACKGROUND With the rapid global spread of COVID-19, clinicians and researchers are searching for an early warning signal that could fore-warn this pandemic, in order to mitigate its transmission and catastrophic effects. As such, multi-disciplinary approaches for the early identification and prevention of future such events within the wider context of the sustainable development goals have been prioritized. (1) In this regard, one approach could be to assess whether the general public’s interest in web-searches about COVID-19 might serve as that early warning signal, especially given the recent worldwide surge in internet traffic on this topic. During the last pandemic in 2003, the SARS outbreak, the number of internet users was only 13.3% of the total users today (https://www.internetworldstats.com/emarketing.htm). In 2019, of the almost 5.5 billion daily worldwide Google SEQs, approximately 7% were about healthcare.(2) Currently, over 60% of Americans use the web to find healthcare information. Infodemiology, i.e. information epidemiology using web-based data, has been correlated with infectious disease outbreaks, cancer awareness campaigns and cancer incidence and mortality. (3-6) Since SEQs reflect public interest in a given topic, we hypothesized that this readily available tool could help inform about emerging COVID-19 outbreaks. We assessed the extent of online interest in this pandemic to determine whether COVID-19 related SEQs correlated with its reported incidence and mortality rate. Also, COVID-19 related websites were evaluated for quality of information therein. OBJECTIVE The objective of the study is to explore the correlation between d web-SEQ surrounding COVID-19 with its actual incidence and mortality. METHODS To test our hypothesis, we prospectively evaluated on a daily basis the web-engine search results for “coronavirus” or “COVID-19” in South Korea, Japan, Italy, France, Germany, Spain, U.K. and U.S. over a 114-day period (01/01/2020 - 04/23/2020). Daily COVID-19 epidemiology data on incidence and mortality were obtained from the eCDC for the corresponding countries (Supplementary). To measure COVID-19 awareness, a join-point regression model was used to identify significant increasing or decreasing SEQ and COVID-19 epidemiologic trends. Comparisons of RSV and ADPC were analyzed to assess loss or gain of interest over time. Daily SEQs were temporally and geographically correlated with the daily eCDC-reported COVID-19 incidence and mortality data. To assess the temporal predictive value of SEQ trends, we quantified a ‘lag period’ which was defined as the time between the first statistically significant rising trend in RSV and its COVID-19 epidemiological curves (Supplementary). We graded websites using a validated instrument (DISCERN) for content evaluation and comparisons between quality and user popularity and dissemination (Supplementary). RESULTS Join-point regression analysis of European and U.S. SEQ curves showed SEQs mirrored epidemiology trends (maximum RSV with ADPC of +6.5% and +14.3%, respectively; Figure 1). During the pandemic’s upswing phase, the rising daily SEQ trends strongly correlated with the rising daily COVID-19 incidence and mortality trends in Europe and the U.S. (correlation-coefficient 0.84-0.95; P<.0001; Table 1). Considering the overall pandemic data for Europe and U.S., correlation coefficients of daily SEQs and COVID-19 incidence were 0.84 and 0.83 and for mortality were 0.77 and 0.78, respectively (P<.0001; Supplementary). For the 5 analyzed European countries, SEQ correlation coefficient ranged between 0.73-0.84 for COVID-19 incidence and 0.47-0.76 for mortality (all P<.001; Supplementary). ‘Lag time’ data-analysis during the upswing of the European pandemic specifically revealed that European SEQ curves preceded the actual initial European cases by 29 days and the initial European mortalities by 31 days. Similarly, during the U.S. pandemic upswing, SEQ curves preceded the actual initial U.S. cases by 33 days and the initial U.S. mortalities by 38 days. During the SEQ curve downswing in Europe, SEQs declined significantly (-2.8% daily; P<.0001), preceding the actual decline of European cases by 17 days and decline in European mortalities by 20 days. During the U.S. SEQ curve downswing, SEQs started declining significantly (-4.0% daily; P<.0001), preceding the actual decline of U.S. cases by 38 days and decline in U.S. mortality by 32 days. After peaking, when European SEQs started decreasing, European COVID-19 cases/mortality were initially still increasing, and subsequently started to flatten and decrease. This reflects the ‘lag time’ phenomenon between SEQ trends and pandemic trends. Also, by this time, information saturation could be a factor, wherein the public had already obtained the relevant information from the web, leading to decreased SEQ traffic. Similar trends were seen in all 5 European countries studied. Resolution of the U.S. pandemic is still ongoing. To assess accuracy of online information, we downloaded 1,000 webpages about COVID-19 and randomly selected 100 using a random-sampling generator. We found that almost 30% of webpages lacked scientific citations to support their content, and 54% lacked author names. Official and institutional websites were more likely to provide this information than unofficial independent websites (DISCERN score 3.34 and 2.58; P=.049). CONCLUSIONS Our prospective analysis shows that SEQ trends closely reflected (and preceded by over two weeks) the actual COVID-19 trends in Europe and U.S. The focus of our paper is the upswing phase of the pandemic, wherein all tested SEQ correlations were strongly positive (correlation-coefficient 0.84-0.95 for incidence/mortality; all P<.0001; Table 1). Since this signal was especially robust during the pandemic’s upswing (peaking March 15), SEQs could provide advance warning of this pandemic. Whether SEQs might also reflect ongoing progress and/or resolution of the pandemic requires further research. We used the eCDC database since it clearly reports daily data for COVID-19 incidence and mortality for Europe and U.S. Instead of using symptom-related searches, we queried the definitive terms “coronavirus” or “COVID-19” to assess impact of SEQ data on disease prevalence and mortality. We prioritized mortality over incidence, since true incidence of COVID-19 is unknown due to insufficient testing. U.S. state-by-state analysis was not possible due to lack of relevant, granular data. Also, detailed analyses for China and Asia were not done due to insufficient Google data; however, we found that SEQ trends for Europe/U.S. lagged behind Asia by ~ 1 month (Supplementary). During the last 2 weeks of January, likely driven by developing news of the China/Asia epidemic, there was early European/U.S. SEQ interest, which proved transient, given the minimal number of European/U.S. cases at that time. Interestingly, SEQs for hand sanitizer, face masks and social distancing paralleled COVID-19 spikes (Supplementary). Rigorously conducted mathematical modelling might estimate relative effects of interventions and help guide policy decision-making. Yet, the predictive ability of these models is based on the quality of the underlying data and the assumptions made, which can range widely and evolve over time. To wit, COVID-19 mortality estimates in U.S. ranged from 480,000 deaths initially, to 100,000-240,000 deaths, to 60,000-80,000 deaths more recently (7). Since SEQ data closely tracked emergence of the actual COVID-19 pandemic, incorporation of web-based metrics into current mathematical modelling algorithms can be explored. Limitations of internet-based infodemiology include availability of relative, not absolute, Google data; disallowed Google usage in China; anonymous/unverifiable user motivations; unknown raw numbers of users; inability to examine sub-populations; and, uncertain impact on public health practices (3-6). Also, the impact of the news media on web search traffic can be considerable. Therefore, separating disease-driven from news-driven web-searches seems appropriate. However, our analysis shows that, irrespective of it’s underlying drivers, overall SEQ trends paralleled COVID-19 trends; this increases the generalizability of our findings. Despite the continuously increasing access to healthcare information, public health literacy remains an issue (8). Greater and more scientifically-based public understanding about the characteristics and transmission patterns of COVID-19 can impact public behavior, making them more likely to follow official guidelines. Doing so can more effectively prepare us in the event of a second wave of this virus (9). Our finding that posted information about COVID-19 is often inaccurate represents an opportunity for improvement. We propose that federal authorities urgently convene a taskforce charged with developing evidence-based, minimal, standards in order to objectively rate COVID-19 related websites. Individual websites voluntarily applying for, and obtaining, such federal rating would be officially promoted to the public as containing reliable information.5 Since false information is dangerous during a pandemic, the above strategy would encourage unofficial websites to prioritize reliable scientific, referenced research over the "more visits, better indexing” strategy. Such an initiative would help properly educate the public, protecting them from dubious, unsubstantiated information during a possible second wave or future pandemics. People greatly used the web for information on COVID-19, providing a unique opportunity to study human web-usage behavior during a pandemic. Web-search trends correlated with and preceded COVID-19 epidemiologic trends. Further investigations evaluating the raw number of SEQs are needed. If our hypothesis is valid, by harnessing the collective crowd-input of tens of millions, web search queries could provide an early warning signal of any future waves of COVID-19, helping save lives. The ability of governments and public health institutions to rapidly identify and respond to the outbreak is key to containing the virus. Providing pertinent information in this critical moment has never been so crucial.