Computer Prediction of Allergen Proteins from Sequence-Derived Protein Structural and Physicochemical Properties

Juan Cui,Lian Yi Han,Hu Li,Choong Yong Ung,Zhi Qun Tang,Chan Juan Zheng,Zhi Wei Cao,Yu Zong Chen
DOI: https://doi.org/10.1016/j.molimm.2006.02.010
IF: 4.174
2007-01-01
Molecular Immunology
Abstract:Background: Computational methods have been developed for predicting allergen proteins from sequence segments that show identity, homology, or motif match to a known allergen. These methods achieve good prediction accuracies, but are less effective for novel proteins with no similarity to any known allergen.Methods: This work tests the feasibility of using a statistical learning method, support vector machines, as such a method. The prediction system is trained and tested by using 1005 allergen proteins from the Allergome database and 22,469 non-allergen proteins from 7871 Pfam families.Results: Testing results by an independent set of 229 allergen and 6717 non-allergen proteins from 7871 Pfam families show that 93.0% and 99.9% of these are correctly predicted, which are comparable to the best results of other methods. Of the 18 novel allergen proteins non-homologous to any other proteins in the Swissprot database, 88.9% is correctly predicted. A further screening of 168,128 proteins in the Swissprot database finds that 2.9% of the proteins are predicted as allergen proteins, which is consistent with the estimated numbers from motif-based methods.Conclusions: Our study suggests that SVM is a potentially useful method for predicting allergen proteins and it has certain capability for predicting novel allergen proteins. Our software can be accessed at http://jing.cz3.nus.edu.sg/cgi-bin/APPEL. (c) 2006 Elsevier Ltd. All rights reserved.
What problem does this paper attempt to address?