Identification of hidden relationships from the coupling of Hydrophobic Cluster Analysis and Domain Architecture information

Guilhem Faure,Isabelle Callebaut
DOI: https://doi.org/10.1093/bioinformatics/btt271
IF: 5.8
2013-05-14
Bioinformatics
Abstract:MOTIVATION: Describing domain architecture is a critical step in the functional characterization of proteins. However, some orphan domains do not match any profile stored in dedicated domain databases and are thereby difficult to analyze.RESULTS: We present here an original novel approach, called TREMOLO-HCA, for the analysis of orphan domain sequences and inspired from our experience in the use of Hydrophobic Cluster Analysis (HCA). Hidden relationships between protein sequences can be more easily identified from the PSI-BLAST results, using information on domain architecture, HCA plots and the conservation degree of amino acids that may participate in the protein core. This can lead to reveal remote relationships with known families of domains, as illustrated here with the identification of a hidden Tudor tandem in the human BAHCC1 protein and a hidden ET domain in the Saccharomyces cerevisiae Taf14p and human AF9 proteins. The results obtained in such a way are consistent with those provided by HHPRED, based on pairwise comparisons of HHMs. Our approach can, however, be applied even in absence of domain profiles or known 3D structures for the identification of novel families of domains. It can also be used in a reverse way for refining domain profiles, by starting from known protein domain families and identifying highly divergent members, hitherto considered as orphan.AVAILABILITY: We provide a possible integration of this approach in an open TREMOLO-HCA package, which is fully implemented in python v2.7 and is available on request. Instructions are available at http://www.impmc.upmc.fr/∼callebau/tremolohca.html.CONTACT: isabelle.callebaut@impmc.upmc.frSUPPLEMENTARY INFORMATION: Supplementary Data are available at Bioinformatics online.
biochemical research methods,biotechnology & applied microbiology,mathematical & computational biology
What problem does this paper attempt to address?