Predicting protein N-terminal signal peptides using position-specific amino acid propensities and conditional random fields

YongXian Fan,Jiangning Song,Chen Xu,Hongbin Shen
DOI: https://doi.org/10.2174/1574893611308020006
2013-01-01
Current Bioinformatics
Abstract:Protein signal peptides play a vital role in targeting and translocation of most secreted proteins and many integral membrane proteins in both prokaryotes and eukaryotes. Consequently, accurate prediction of signal peptides and their cleavage sites is an important task in molecular biology. In the present study, firstly, we develop a novel discriminative scoring method for classifying proteins with or without signal peptides. This method successfully captured the characteristics of signal peptides and non-signal peptides by integrating hydrophobicity alignment and position-specific amino acid propensities based on the highest average positions. As a result, this method is capable of discriminating proteins with signal peptides at the overall accuracies of 96.3%, 97.0% and 97.2% by leave-one-out jackknife tests on the constructed benchmark datasets for three different organisms, i.e. Eukaryotic, Gram-negative, and Gram-positive respectively. Secondly, we consider the prediction task of signal peptide cleavage sites as a sequence labeling problem and apply Conditional Random Fields (CRFs) algorithm to solve it. Experimental results demonstrate that the proposed CRFs-based cleavage site finding approach can achieve the prediction success rates of 80.8%, 89.4%, and 74.0% respectively, for the secretory proteins from three different organisms. An online tool, LnSignal, is established for labeling the N-terminal signal cleavage sites and is freely available for academic use at http://www.csbio.sjtu.edu.cn/bioinf/LnSignal.
What problem does this paper attempt to address?