Abstract:Domain boundary prediction is one of the most important problems in the study of protein structure and function, especially for large proteins. At present, most domain boundary prediction methods have low accuracy and limitations in dealing with multi-domain proteins. In this study, we develop a sequence-based protein domain boundary prediction, named DomBpred. In DomBpred, the input sequence is first classified as either a single-domain protein or a multi-domain protein through a designed effective sequence metric based on a constructed single-domain sequence library. For the multi-domain protein, a domain-residue clustering algorithm inspired by Ising model is proposed to cluster the spatially close residues according inter-residue distance. The unclassified residues and the residues at the edge of the cluster are then tuned by the secondary structure to form potential cut points. Finally, a domain boundary scoring function is proposed to recursively evaluate the potential cut points to generate the domain boundary. DomBpred is tested on a large-scale test set of FUpred comprising 2549 proteins. Experimental results show that DomBpred better performs than the state-of-the-art methods in classifying whether protein sequences are composed by single or multiple domains, and the Matthew's correlation coefficient is 0.882. Moreover, on 849 multi-domain proteins, the domain boundary distance and normalised domain overlap scores of DomBpred are 0.523 and 0.824, respectively, which are 5.0% and 4.2% higher than those of the best comparison method, respectively. Comparison with other methods on the given test set shows that DomBpred outperforms most state-of-the-art sequence-based methods and even achieves better results than the top-level template-based method. The executable program is freely available at https://github.com/iobio-zjut/DomBpred and the online server at http://zhanglab-bioinf.com/DomBpred/.

Prediction of protein domains from sequence information using support vector machines

Protein Domains Prediction Method Based on Support Vector Machines

Prediction of Functional Class of Proteins and Peptides Irrespective of Sequence Homology by Support Vector Machines.

Protein domain boundary prediction by combining support vector machine and domain guess by size algorithm

DomBpred: Protein Domain Boundary Prediction Based on Domain-Residue Clustering Using Inter-Residue Distance.

DomSVR: Domain Boundary Prediction with Support Vector Regression from Sequence Information Alone

Using an Ensemble of Support Vector Machine Classifiers to Predict Protein Supersecondary Structural Motifs.

Prediction of nucleic acid-binding proteins using support vector machines

Using Pseudo-Amino Acid Composition and Support Vector Machine to Predict Protein Structural Class.

Prediction of protein structural class using a combined representation of protein-sequence information and support vector machine

Application Research of Protein Structure Prediction Based Support Vector Machine

Support Vector Machine For Prediction Of Dna-Binding Domains In Protein-Dna Complexes

Prediction of Protein Secondary Structure Content Using Support Vector Machine

Domain Position Prediction Based on Sequence Information by Using Fuzzy Mean Operator

Prediction of protein binding sites in protein structures using hidden Markov support vector machine

Sequence-Based Protein Domain Boundary Prediction Using Bp Neural Network With Various Property Profiles

A Hybrid Method for Identification of Structural Domains

Using Support Vector Machines for Prediction of Protein Structural Classes Based on Discrete Wavelet Transform.

Predicting Protein Secondary Structure by a Support Vector Machine Based on a New Coding Scheme.

Prediction of Eukaryotic Protein Subcellular Location Using a Novel Feature Extraction Method and Support Vector Machine

Prediction of protein structure class by coupling improved genetic algorithm and support vector machine