Predicting DNA-binding proteins: approached from Chou’s pseudo amino acid composition and other specific sequence features

Y. Fang,Y. Guo,Y. Feng,M. Li
DOI: https://doi.org/10.1007/s00726-007-0568-2
IF: 3.7891
2007-01-01
Amino Acids
Abstract:Summary. DNA-binding proteins play a pivotal role in gene regulation. It is vitally important to develop an automated and efficient method for timely identification of novel DNA-binding proteins. In this study, we proposed a method based on alone the primary sequences of proteins to predict the DNA-binding proteins. DNA-binding proteins were encoded by autocross-covariance transform, pseudo-amino acid composition, dipeptide composition, respectively and also the different combinations of the three encoded methods; further, these feature matrices were applied to support vector machine classifiers to predict the DNA-binding proteins. All modules were trained and validated by the jackknife cross-validation test. Through comparing the performance of these substituted modules, the best result was obtained from pseudo-amino acid composition with the overall accuracy of 96.6% and the sensitivity of 90.7%. The results suggest that it can efficiently predict the novel DNA-binding proteins only using the primary sequences.
What problem does this paper attempt to address?