Abstract:Domain boundary prediction is one of the most important problems in the study of protein structure and function, especially for large proteins. At present, most domain boundary prediction methods have low accuracy and limitations in dealing with multi-domain proteins. In this study, we develop a sequence-based protein domain boundary prediction, named DomBpred. In DomBpred, the input sequence is first classified as either a single-domain protein or a multi-domain protein through a designed effective sequence metric based on a constructed single-domain sequence library. For the multi-domain protein, a domain-residue clustering algorithm inspired by Ising model is proposed to cluster the spatially close residues according inter-residue distance. The unclassified residues and the residues at the edge of the cluster are then tuned by the secondary structure to form potential cut points. Finally, a domain boundary scoring function is proposed to recursively evaluate the potential cut points to generate the domain boundary. DomBpred is tested on a large-scale test set of FUpred comprising 2549 proteins. Experimental results show that DomBpred better performs than the state-of-the-art methods in classifying whether protein sequences are composed by single or multiple domains, and the Matthew's correlation coefficient is 0.882. Moreover, on 849 multi-domain proteins, the domain boundary distance and normalised domain overlap scores of DomBpred are 0.523 and 0.824, respectively, which are 5.0% and 4.2% higher than those of the best comparison method, respectively. Comparison with other methods on the given test set shows that DomBpred outperforms most state-of-the-art sequence-based methods and even achieves better results than the top-level template-based method. The executable program is freely available at https://github.com/iobio-zjut/DomBpred and the online server at http://zhanglab-bioinf.com/DomBpred/.

Protein domain identification methods and online resources

DomBpred: Protein Domain Boundary Prediction Based on Domain-Residue Clustering Using Inter-Residue Distance.

Identification and analysis of domains in proteins

A short review of protein fold recognition methods

A Hybrid Method for Identification of Structural Domains

Recent Progress in Machine Learning-Based Methods for Protein Fold Recognition

Protein Secondary Structure Prediction: A Review of Progress and Directions

Prediction of Protein (domain) Structural Classes Based on Amino-Acid Index.

Exploring structural diversity across the protein universe with The Encyclopedia of Domains

ThreaDom: extracting protein domain boundary information from multiple threading alignments

A comprehensive review of protein-centric predictors for biomolecular interactions: from proteins to nucleic acids and beyond

Review and Comparative Analysis of Methods and Advancements in Predicting Protein Complex Structure

Comparative Mapping of Sequence-Based and Structure-Based Protein Domains.

A comprehensive review and comparison of existing computational methods for protein function prediction

Prediction of Protein Domain Folding Classes

Prediction of Human Protein–protein Interaction by a Domain-Based Approach

DomHR: Accurately Identifying Domain Boundaries in Proteins Using a Hinge Region Strategy.

Computational Methods for Protein-Protein Interaction and Their Application.

A Comprehensive Review and Comparison of Different Computational Methods for Protein Remote Homology Detection

A Survey on Algorithms for Protein Contact Prediction

REVIEW : Recent Advances in Developing Web-Servers for Predicting Protein Attributes