Abstract:Protein-protein interactions (PPI) play a key role in various aspects of the structural and functional organization of the cell. Knowledge about them unveils the molecular mechanisms of biological processes. A number of databases such as MINT (Zanzoni et al., 2002), BIND (Bader et al., 2003), and DIP (Xenarios et al., 2002) have been created to store protein interaction information in structured and standard formats. However, the amount of biomedical literature regarding protein interactions is increasing rapidly and it is difficult for interaction database curators to detect and curate protein interaction information manually. Thus, most of the protein interaction information remains hidden in the text of the papers in the literature. Therefore, automatic extraction of protein interaction information from biomedical literature has become an important research area. Existing PPI works can be roughly divided into three categories: Manual pattern engineering approaches, Grammar engineering approaches and Machine learning approaches. Manual pattern engineering approaches define a set of rules for possible textual relationships, called patterns, which encode similar structures in expressing relationships. The SUISEKI system uses regular expressions, with probabilities that reflect the experimental accuracy of each pattern to extract interactions into predefined frame structures (Blaschke & Valencia, 2002). Ono et al. manually defined a set of rules based on syntactic features to preprocess complex sentences, with negation structures considered as well (Ono et al., 2001). The BioRAT system uses manually engineered templates that combine lexical and semantic information to identify protein interactions (Corney et al., 2004). Such manual pattern engineering approaches for information extraction are very hard to scale up to large document collections since they require labor-intensive and skilldependent pattern engineering. Grammar engineering approaches use manually generated specialized grammar rules that perform a deep parse of the sentences. Sekimizu et al. used shallow parser, EngCG, to generate syntactic, morphological, and boundary tags (Sekimizu et al., 1998). Based on the tagging results, subjects and objects were recognized for the most frequently used verbs. Fundel et al. proposed RelEx based on the dependency parse trees to extract relations (Fundel et al., 2007). Machine learning techniques for extracting protein interaction information have gained interest in the recent years. In most recent work on machine learning for PPI extraction, the PPI extraction task is casted as learning a decision function that determines for each

Learning from Positive and Unlabeled Documents for Retrieval of Bacterial Protein-Protein Interaction Literature

Semi-Supervised Learning of Text Classification on Bacterial Protein-Protein Interaction Documents

Classification Systems for Bacterial Protein-Protein Interaction Document Retrieval

Positive-unlabeled learning in bioinformatics and computational biology: a brief review

Comparison of classification methods on protein-protein interaction document classification

Effectively Identifying Compound-Protein Interactions by Learning from Positive and Unlabeled Examples.

PPI finder: a mining tool for human protein-protein interactions

Document classification for mining host pathogen protein-protein interactions.

Protein-Protein Interactions Extraction From Biomedical Literatures

Extracting Protein-Protein Interactions (PPIs) from Biomedical Literature using Attention-based Relational Context Information

Uncertainty Sampling-Based Active Learning for Protein-Protein Interaction Extraction from Biomedical Literature.

Large-scale protein-protein post-translational modification extraction with distant supervision and confidence calibrated BioBERT

Mining physical protein-protein interactions by exploiting abundant features

PPICurator: A Tool for Extracting Comprehensive Protein–Protein Interaction Information

Improved Prediction of Protein-Protein Interactions Using Novel Negative Samples, Features, and an Ensemble Classifier

From Biomedical Literature to Knowledge: Mining Protein-Protein Interactions

PPICurator: a tool for extracting comprehensive protein-protein interaction information.

Computationally predicting protein-RNA interactions using only positive and unlabeled examples

Automatic Noise Reduction of Domain-Specific Bibliographic Datasets Using Positive-Unlabeled Learning.

Detecting experimental techniques and selecting relevant documents for protein-protein interactions from biomedical literature

A Text Feature-Based Approach for Literature Mining of Lncrna-Protein Interactions