A profile entropy dependent scoring function for protein threading

Jian Peng,Chi Xu
2009-01-01
Abstract:Proteins play fundamental roles in all biological processes. Akin to the complete sequencing of genomes, complete descriptions of protein structures is a fundamental step towards understanding biological life, and is also highly relevant in the development of therapeutics and drugs. Computational prediction methods, especially template-based modeling, can quickly generate crude but useful structure models at a large scale. The challenge of template-based modeling lies in the recognition of correct templates and the generation of accurate sequence-template alignments. Evolutionary information (i.e., sequence profiles) has proved to be very powerful in detecting remote homologs, as demonstrated by the state-of-the-art profile-profile alignment method HHpred. However, there are still a lot of proteins without good sequence profiles. Here, we present a new protein threading method for proteins without good sequence profiles by nonlinearly combining evolutionary and non-evolutionary information. In particular, we model protein threading using a probabilistic graphical model Conditional (Markov) Random Fields (CRF) and training the model using a gradient tree boosting algorithm. The resultant threading model guides sequence-template alignment using a nonlinear scoring function consisting of a collection of regression trees. Each regression tree models a type of nonlinear relationship among different protein information. Experimental results indicate that when evolutionary information is not good enough, this new threading method greatly outperforms HHpred in terms of both alignment accuracy and fold recognition rate. The paradigm presented here for the design of a nonlinear scoring function is very general. It can also be applied to protein sequence alignment and RNA alignment.
What problem does this paper attempt to address?