Literature Mining of Protein Phosphorylation Using Dependency Parse Trees

Mang Wang,Hong Xia,Dongdong Sun,Zhaoxiong Chen,Minghui Wang,Ao Li
DOI: https://doi.org/10.1016/j.ymeth.2014.01.008
IF: 4.647
2014-01-01
Methods
Abstract:As one of the most common post-translational modifications (PTMs), protein phosphorylation plays an important role in various biological processes, such as signaling transduction, cellular metabolism, differentiation, growth, regulation and apoptosis. Protein phosphorylation is of great value not only in illustrating the underlying molecular mechanisms but also in treatment of diseases and design of new drugs. Recently, there is an increasing interest in automatically extracting phosphorylation information from biomedical literatures. However, it still remains a challenging task due to the tremendous volume of literature and circuitous modes of expression for protein phosphorylation. To address this issue, we propose a novel text-mining method for efficiently retrieving and extracting protein phosphorylation information from literature. By employing natural language processing (NLP) technologies, this method transforms each sentence into dependency parse trees that can precisely reflect the intrinsic relationship of phosphorylation-related key words, from which detailed information of substrates, kinases and phosphorylation sites is extracted based on syntactic patterns. Compared with other existing approaches, the proposed method demonstrates significantly improved performance, suggesting it is a powerful bioinformatics approach to retrieving phosphorylation information from a large amount of literature. A web server for the proposed method is freely available at http://bioinformatics.ustc.edu.cn/pptm/.
What problem does this paper attempt to address?