Extraction of protein interaction information from unstructured text using a link grammar parser

R.A. Abul Seoud,A.-B.M. Youssef,Y.M. Kadah,Rania A. Abul Seoud,Abou-Bakr M. Youssef,Yasser M. Kadah
DOI: https://doi.org/10.1109/icces.2007.4447028
2007-11-01
Abstract:As research continues to generate vast amounts of data, pertaining to protein interactions, there is a critical need to capture these results in structured formats permitting for computational analysis. Automated the extraction of interactions from unstructured text, would improve the content of databases that store this information and set a method for managing the continued growth of new literature being published. Many algorithms have been reported for extracting biochemical interactions from biomedical text. Natural language processing approaches at various complexity levels have been recorded for extracting biochemical interactions from biomedical text. Some algorithms used simple template matching, others exploit sophisticated parsing techniques. In this paper, we present an automated NLP-based information extraction system, to identify protein interactions in biomedical text. Link grammar parsing can handle many syntactic structures and is computationally relatively efficient. Customizing the parser for the biomedical domain is expected to improve its performance further. Our approach is based on first, tagging biological entities with the help of biomedical and linguistic protein names databases. The system extracts complete interactions by analyzing the matching contents of syntactic roles and their linguistically significant combinations.
What problem does this paper attempt to address?