Abstract:Protein-protein interactions (PPI) play a key role in various aspects of the structural and functional organization of the cell. Knowledge about them unveils the molecular mechanisms of biological processes. A number of databases such as MINT (Zanzoni et al., 2002), BIND (Bader et al., 2003), and DIP (Xenarios et al., 2002) have been created to store protein interaction information in structured and standard formats. However, the amount of biomedical literature regarding protein interactions is increasing rapidly and it is difficult for interaction database curators to detect and curate protein interaction information manually. Thus, most of the protein interaction information remains hidden in the text of the papers in the literature. Therefore, automatic extraction of protein interaction information from biomedical literature has become an important research area. Existing PPI works can be roughly divided into three categories: Manual pattern engineering approaches, Grammar engineering approaches and Machine learning approaches. Manual pattern engineering approaches define a set of rules for possible textual relationships, called patterns, which encode similar structures in expressing relationships. The SUISEKI system uses regular expressions, with probabilities that reflect the experimental accuracy of each pattern to extract interactions into predefined frame structures (Blaschke & Valencia, 2002). Ono et al. manually defined a set of rules based on syntactic features to preprocess complex sentences, with negation structures considered as well (Ono et al., 2001). The BioRAT system uses manually engineered templates that combine lexical and semantic information to identify protein interactions (Corney et al., 2004). Such manual pattern engineering approaches for information extraction are very hard to scale up to large document collections since they require labor-intensive and skilldependent pattern engineering. Grammar engineering approaches use manually generated specialized grammar rules that perform a deep parse of the sentences. Sekimizu et al. used shallow parser, EngCG, to generate syntactic, morphological, and boundary tags (Sekimizu et al., 1998). Based on the tagging results, subjects and objects were recognized for the most frequently used verbs. Fundel et al. proposed RelEx based on the dependency parse trees to extract relations (Fundel et al., 2007). Machine learning techniques for extracting protein interaction information have gained interest in the recent years. In most recent work on machine learning for PPI extraction, the PPI extraction task is casted as learning a decision function that determines for each

Classification Systems for Bacterial Protein-Protein Interaction Document Retrieval

Comparison of classification methods on protein-protein interaction document classification

Semi-Supervised Learning of Text Classification on Bacterial Protein-Protein Interaction Documents

Learning from Positive and Unlabeled Documents for Retrieval of Bacterial Protein-Protein Interaction Literature

Feature generation and representations for protein-protein interaction classification.

Document classification for mining host pathogen protein-protein interactions.

Detecting experimental techniques and selecting relevant documents for protein-protein interactions from biomedical literature

Classifying protein-protein interaction articles using word and syntactic features

Protein-Protein Interactions Extraction From Biomedical Literatures

Imbalanced text classification on host pathogen protein-protein interaction documents

Extracting Protein-Protein Interactions (PPIs) from Biomedical Literature using Attention-based Relational Context Information

Mining physical protein-protein interactions by exploiting abundant features

PCorral--interactive Mining of Protein Interactions from MEDLINE.

PPICurator: A Tool for Extracting Comprehensive Protein–Protein Interaction Information

Protein-protein interactions: detection, reliability assessment and applications.

Prediction Of Protein-Protein Interactions Using Subcellular And Functional Localizations

Mining physical protein-protein interactions from the literature

Overview of the protein-protein interaction annotation extraction task of BioCreative II

An Ensemble Classifier to Predict Protein-Protein Interactions by Combining PSSM-based Evolutionary Information with Local Binary Pattern Model.

Improved Prediction of Protein-Protein Interactions Using Novel Negative Samples, Features, and an Ensemble Classifier

A hybrid method for extraction of protein-protein interactions from literature