A Text Feature-Based Approach for Literature Mining of Lncrna-Protein Interactions

Ao Li,Qiguang Zang,Dongdong Sun,Minghui Wang
DOI: https://doi.org/10.1016/j.neucom.2015.11.110
2015-01-01
IFAC-PapersOnLine
Abstract:Long non-coding RNAs (lncRNAs) play important roles in regulating transcriptional and post-transcriptional levels. Currently, Knowledge of lncRNA and protein interactions (LPIs) is crucial for biomedical researches that are related to lncRNA. Many freshly discovered LPIs are stored in biomedical literature. With over one million new biomedical journal articles published every year, just keeping up with the novel finding requires automatically extracting information by text mining. To address this issue, we apply a text feature-based text mining approach to efficiently extract LPIs from biomedical literatures. Our approach consists of four steps. By employ natural language processing (NLP) technologies, this approach extracts text features from sentences that can precisely reflect the real LPIs. Our approach involves four steps including data collection, text pre-processing, structured representation, features extraction and training model and classification. The F-score performance of our approach achieves 79.5%, and the results indicate that the proposed approach can efficiently extract LPIs from biomedical literature.
What problem does this paper attempt to address?