Extracting LncRNA-protein Interactions from Literature Using a Text Feature-based Approach

Qiguang Zang
DOI: https://doi.org/10.1016/j.neucom.2015.11.110
IF: 6
2016-01-01
Neurocomputing
Abstract:Long non-coding RNAs (lncRNAs) play important roles in regulating transcriptional and post-transcriptional levels. Currently, Knowledge of lncRNA and protein interactions (LPIs) is crucial for biomedical researches that are related to lncRNA. Many freshly discovered LPIs are stored in biomedical literature. With over one million new biomedical journal articles published every year, just keeping up with the novel finding requires automatically extracting information by text mining. To address this issue, we apply a text feature-based text mining approach to efficiently extract LPIs from biomedical literatures. Our approach consists of four steps. By employ natural language processing (NLP) technologies, this approach extracts text features from sentences that can precisely reflect the real LPIs. Our approach involves four steps including data collection, text pre-processing, structured representation, features extraction and training model and classification. The F-score performance of our approach achieves 79.5%, and the results indicate that the proposed approach can efficiently extract LPIs from biomedical literature.
What problem does this paper attempt to address?