Signal-3L 2.0: A Hierarchical Mixture Model for Enhancing Protein Signal Peptide Prediction by Incorporating Residue-Domain Cross-Level Features.

Yi-Ze Zhang,Hong-Bin Shen
DOI: https://doi.org/10.1021/acs.jcim.6b00484
IF: 6.162
2017-01-01
Journal of Chemical Information and Modeling
Abstract:Signal peptides play key roles in targeting and translocation of integral membrane proteins and secretory proteins. However, signal peptides present several challenges for automatic prediction methods. One challenge is that it is difficult to discriminate signal peptides from transmembrane helices, as both the H-region of the peptides and the transmembrane helices are hydrophobic. Another is that it is difficult to identify the cleavage site between signal peptides and mature proteins, as cleavage motifs or patterns are still unclear for most proteins. To solve these problems and further enhance automatic signal peptide recognition, we report a new Signal-3L 2.0 predictor. Our new model is constructed with a hierarchical protocol, where it first determines the existence of a signal peptide. For this, we propose a new. residue-domain cross-level feature-driven approach, and we demonstrate that protein functional domain information is particularly useful for discriminating between the trans-membrane helices and signal peptides as they perform different functions. Next, in order to accurately identify the unique signal peptide cleavage sites along the sequence, we designed a top-down approach where a subset of potential cleavage sites are screened using statistical learning rules, and then a final unique site is selected according to its evolution conservation score. Because this mixed approach utilizes both statistical learning and evolution analysis, it shows a strong capacity for recognizing cleavage sites. Signal-3L 2.0 has been benchmarked on multiple data, sets, and the experimental results have demonstrated its accuracy.
What problem does this paper attempt to address?