Monitoring Infectious Diseases in the Big Data Era

Yuanqiang Zou,Yousong Peng,Lizong Deng,Taijiao Jiang
DOI: https://doi.org/10.1007/s11434-014-0696-5
IF: 18.9
2014-01-01
Science Bulletin
Abstract:Infectious diseases have caused and will continue to cause a significant impact on public health. Following the unprecedented H7N9 outbreak in China in early 2013, the recent spreading of the Ebola virus in West Africa has once again placed the danger of infectious diseases in the public eye and has caused significant public alarm all over the world. Given the frequent recurrence of rapidly evolving pathogens, such as seasonal influenza viruses, and the sporadic introduction of novel pathogens, such as the 2003 SARS coronavirus in China, the prevention and control of infectious diseases has become a major global public health issue, which relies heavily on an effective surveillance strategy. In China, the surveillance of infectious diseases is mainly conducted by a nationwide monitoring network consisting of numerous hospitals and health departments. Under the guidance of the Chinese Center for Disease Control and Prevention (CDC), these monitors will watch, report, collect and analyze suspicious samples. In addition to the traditional methods of collecting immunological and biological data for emerging pathogens, large-scale gene sequencing has now been widely used in the surveillance of infectious diseases. The large amount of genomic data on infectious diseases have facilitated a more accurate and rapid identification of pathogens, thereby assisting in the prediction of their potential pathological and epidemical characteristics. Taking the unprecedented 2013 H7N9 virus as an example, the considerable genomic data enabled Chinese researchers to rapidly identify the novel pathogens and infer their origins and evolutionary pathways [1, 2]; this assisted the formulation of proper measures for the control of infections of the viruses to humans. Given the ever increasing genomic data, how to effectively model the big genomic data to infer the characteristics of infectious diseases has challenged the traditional molecular evolutionary analysis approaches. Previous studies have shown that, compared with the traditional phylogenetic analysis, the extraction of co-evolutionary signals and advanced features from the big gene data relating to seasonal influenza viruses could capture the characteristics of their antigenic changes; this would lead to a more accurate and timely recommendation of seasonal vaccine strains [3, 4]. There is no doubt that the coupling of large-scale gene sequencing with advanced computational modeling has led to new opportunities for effectively fighting infectious diseases. Knowing how the infectious diseases emerge and spread is the key to control them. Unfortunately, it is very difficult and even infeasible to collect important information, such as the infection and transmission rate at the beginning of the epidemic. Due to the development of Internet technology, large-scale public participation will become an important means of gathering outbreak information. Nowadays, more and more people are seeking medical help and thus sharing their health information on the Internet. Therefore, the proper mining of this type of Internet big data could help monitor the dynamics of infectious diseases. The most famous example is Google Flu Trends, which can rapidly predict influenza activity by aggregating Google search queries [5]. Y. Zou Y. Peng T. Jiang (&) College of Computer Science and Electronic Engineering, Hunan University, Changsha 410082, China e-mail: taijiao@moon.ibp.ac.cn
What problem does this paper attempt to address?