Comprehensive Comparison of Eight Statistical Modelling Methods Used in Quantitative Structure-Retention Relationship Studies for Liquid Chromatographic Retention Times of Peptides Generated by Protease Digestion of the Escherichia Coli Proteome

Peng Zhou,Feifei Tian,Fenglin Lv,Zhicai Shang
DOI: https://doi.org/10.1016/j.chroma.2009.01.086
2009-01-01
Abstract:In this study, we propose a new peptide characterization method that gives attention to both the amino acid composition and the residue local environment. Using this approach, structural characteristics of peptides derived from Escherichia coli proteome were parameterized and, based upon that, the performance profile of eight statistical modelling methods were validated rigorously and compared comprehensively by applying them to modelling relationship between the sequence structure and retention ability for 816 experimentally measured peptides and to predicting normalized retention times for 121,273 unmeasured peptides in liquid chromatography. Results show that the regression models constructed by nonlinear approaches are more robust and predictable but time-consuming than those by linear ones. In these modelling methods, Gaussian process and back-propagation neural network possess the best stability, unbiased ability and predictive power, thus they can be used to accurately model the peptide structure-retention relationships; multiple linear regression and partial least squares regression perform worse compared to nonlinear modelling techniques but they are computationally efficient, so they are promising candidates for solving the qualitative problems involved in massive data. In addition, by investigating the descriptor importance in different models we found that the amino acid composition presents a significantly linear correlation with the retention time of peptides, whereas the residue environment is mainly correlated nonlinearly with peptide retention. The polar Arg and strongly hydrophobic amino acids such as Leu, Ile, Phe, Trp and Val are the critical factors influencing peptide retention behavior.
What problem does this paper attempt to address?