Intelligent Bibliometrics for Discovering the Associations Between Genes and Diseases: Methodology and Case Study

Mengjia Wu,Yi Zhang
2020-01-01
Abstract:Discovering disease-gene associations is an essential but challenging task in modern medicine. Within all the data-driven approaches targeting at this issue, literature-based knowledge discovery widely extends the discovering boundaries and uncovers implicit knowledge from unstructured textual data. However, most of the current literature-based methods require the involvement of specific expertise or prior knowledge. In this paper, we propose an adaptable and transferable methodology to 1) identify crucially genetic factors for a specific disease and 2) predict emerging genetic associations for the disease. Specifically, biomedical entities including diseases, chemicals, genes and genetic variations are extracted from literature data, then a heterogenous co-occurrence network is constructed and a semantic adjacency matrix is generated using the idea of Word2Vec. Following this, key genes and genetic variats are identified through centrality measurement on the network; emerging disease-gene associations are captured via a link prediction approach enhanced by the semantic matrix. We applied the proposed methodology to a literature dataset containing 54,219 scientific articles of atrial fibrillation (AF) to demonstrate its reliability. The results yielded a) crucial biomedical entities for AF highlighting five key gene groups and one potentially associated protein mutation; b) a list of emerging AF-genetic factors pairs that are worth in-depth exploration.
What problem does this paper attempt to address?