Abstract:Background Predicting the binding sites between two interacting proteins provides important clues to the function of a protein. Recent research on protein binding site prediction has been mainly based on widely known machine learning techniques, such as artificial neural networks, support vector machines, conditional random field, etc. However, the prediction performance is still too low to be used in practice. It is necessary to explore new algorithms, theories and features to further improve the performance. Results In this study, we introduce a novel machine learning model hidden Markov support vector machine for protein binding site prediction. The model treats the protein binding site prediction as a sequential labelling task based on the maximum margin criterion. Common features derived from protein sequences and structures, including protein sequence profile and residue accessible surface area, are used to train hidden Markov support vector machine. When tested on six data sets, the method based on hidden Markov support vector machine shows better performance than some state-of-the-art methods, including artificial neural networks, support vector machines and conditional random field. Furthermore, its running time is several orders of magnitude shorter than that of the compared methods. Conclusion The improved prediction performance and computational efficiency of the method based on hidden Markov support vector machine can be attributed to the following three factors. Firstly, the relation between labels of neighbouring residues is useful for protein binding site prediction. Secondly, the kernel trick is very advantageous to this field. Thirdly, the complexity of the training step for hidden Markov support vector machine is linear with the number of training samples by using the cutting-plane algorithm.

Training the Hidden Vector State Model from Un-annotated Corpus

Semi-supervised Learning of the Hidden Vector State Model for Protein-Protein Interactions Extraction

An Improved Hidden Vector State Model Approach and Its Adaptation in Extracting Protein Interaction Information from Biomedical Literature

Extracting Protein-Protein Interactions from the Literature Using the Hidden Vector State Model

Extracting Protein-Protein Interactions from MEDLINE using the Hidden Vector State model.

Extracting Protein-Protein Interaction based on Discriminative Training of the Hidden Vector State Model.

Ontology-Based Protein-Protein Interactions Extraction from Literature Using the Hidden Vector State Model

Biomedical events extraction using the hidden vector state model.

Discriminative Training of the Hidden Vector State Model for Semantic Parsing

A Hybrid Generative/Discriminative Framework to Train a Semantic Parser from an Un-annotated Corpus.

A novel framework of training hidden markov support vector machines from lightly-annotated data.

Semi-supervised Method for Extraction of Protein-Protein Interactions Using Hybrid Model

Semantic Parsing for Biomedical Event Extraction.

A Predictive Model for Compound-Protein Interactions Based on Concatenated Vectorization

Prediction of protein binding sites in protein structures using hidden Markov support vector machine

Learning Conditional Random Fields From Unaligned Data For Natural Language Understanding

Global Vectors Representation of Protein Sequences and Its Application for Predicting Self-Interacting Proteins with Multi-Grained Cascade Forest Model.

Hybrid Model Of Neural Network And Hidden Markov Model For Protein Secondary Structure Prediction

Modeling Protein Using Large-scale Pretrain Language Model

A Protein-Protein Interaction Extraction Approach Based on Large Pre-trained Language Model and Adversarial Training.

Multimodal Pre-Training Model for Sequence-based Prediction of Protein-Protein Interaction