Predicting Immunogenic T-cell Epitopes by Combining Various Sequence-Derived Features.
Wen Zhang,Juan Liu,Yi Xiong,Meng Ke,Ke Zhang
DOI: https://doi.org/10.1109/bibm.2013.6732451
2013-01-01
Abstract:The prediction of T-cell epitopes is of great help for facilitating vaccine design and understanding the immune system. In the bioinformatics, the MHC-binding peptides are defined as the T-cell epitopes, which will trigger the immune response to the antigens. However, binding peptides cannot necessarily activate the immune response, namely non-immunogenic. Until now, little attention has been paid to the immunogenic epitopes. Therefore, the recognition of immunogenic epitopes is a challenging task of the practical value. This paper systematically evaluates a wide variety of sequenced-derived features, which have been ever used for epitope prediction or similar tasks, and reveals their relationship with epitope immunogenicity. Then, we consider how to effectively exploit various features for the computational prediction of immunogenic epitopes. Subsequently, the random forest is adopted as the classification engine, and an ensemble model is developed by using the average scores of individual feature-based predictors. Compared with the previously published methods (POPI, POPISK and PAAQD), our models produce better performance on the benchmark datasets. Evaluated by t-test, the improvements of our method against existing methods are statistically significant (P<;0.01), showing the promise for the immunogenic epitope prediction. At present, only one MHC allele (HLA-A2) has sufficient data for the immunogenicity study. In the near future, with the increasing availability of immunogenic epitopes, we will carry out computational experiments on more MHC alleles. The source code for the ensemble model is available at: http://bcell.whu.edu.cn/sourcecode.html.