Abstract:Background The development of high-throughput technologies has produced several large scale protein interaction data sets for multiple species, and significant efforts have been made to analyze the data sets in order to understand protein activities. Considering that the basic units of protein interactions are domain interactions, it is crucial to understand protein interactions at the level of the domains. The availability of many diverse biological data sets provides an opportunity to discover the underlying domain interactions within protein interactions through an integration of these biological data sets. Results We combine protein interaction data sets from multiple species, molecular sequences, and gene ontology to construct a set of high-confidence domain-domain interactions. First, we propose a new measure, the expected number of interactions for each pair of domains, to score domain interactions based on protein interaction data in one species and show that it has similar performance as the E-value defined by Riley et al . [ 1 ]. Our new measure is applied to the protein interaction data sets from yeast, worm, fruitfly and humans. Second, information on pairs of domains that coexist in known proteins and on pairs of domains with the same gene ontology function annotations are incorporated to construct a high-confidence set of domain-domain interactions using a Bayesian approach. Finally, we evaluate the set of domain-domain interactions by comparing predicted domain interactions with those defined in iPfam database [ 2 , 3 ] that were derived based on protein structures. The accuracy of predicted domain interactions are also confirmed by comparing with experimentally obtained domain interactions from H. pylori [ 4 ]. As a result, a total of 2,391 high-confidence domain interactions are obtained and these domain interactions are used to unravel detailed protein and domain interactions in several protein complexes. Conclusion Our study shows that integration of multiple biological data sets based on the Bayesian approach provides a reliable framework to predict domain interactions. By integrating multiple data sources, the coverage and accuracy of predicted domain interactions can be significantly increased.

DomBpred: Protein Domain Boundary Prediction Based on Domain-Residue Clustering Using Inter-Residue Distance.

Sequence-Based Protein Domain Boundary Prediction Using Bp Neural Network With Various Property Profiles

ThreaDom: extracting protein domain boundary information from multiple threading alignments

An Improved Profile-Level Domain Linker Propensity Index for Protein Domain Boundary Prediction.

Protein domain identification methods and online resources

An Integrated Approach to the Prediction of Domain-Domain Interactions

Unsupervised domain classification of AlphaFold2-predicted protein structures

ProDOMA: improve PROtein DOMAin classification for third-generation sequencing reads using deep learning

Identification and analysis of domains in proteins

ECOD domain classification of 48 whole proteomes from AlphaFold Structure Database using DPAM2

KemaDom: a web server for domain prediction using kernel machine with local context.

PredUs: a Web Server for Predicting Protein Interfaces Using Structural Neighbors

DRBpred: A sequence-based machine learning method to effectively predict DNA- and RNA-binding residues

TargetDBP: Accurate DNA-Binding Protein Prediction Via Sequence-Based Multi-View Feature Learning

A multi-objective optimization approach accurately resolves protein domain architectures

Dobali: a Domain-based Multiple Sequence Alignment Tool

Beyond the E-value: stratified statistics for protein domain prediction

PANDA: Protein function prediction using domain architecture and affinity propagation

Protein domain embeddings for fast and accurate similarity search

PredHS: a Web Server for Predicting Protein–protein Interaction Hot Spots by Using Structural Neighborhood Properties

Predicting physiologically relevant SH3 domain mediated protein–protein interactions in yeast