Prediction of Ion Drift Times for a Proteome-Wide Peptide Set Using Partial Least Squares Regression, Least-Squares Support Vector Machine and Gaussian Process

Xiuhong Liu,Jun Liang,Jicai Fan,Zhicai Shang
DOI: https://doi.org/10.1002/qsar.200910075
2009-01-01
QSAR & Combinatorial Science
Abstract:Quantitative structure-property relationships (QSPRs) have been developed to predict the ion mobility spectrometry (IMS) drift time t(D) for a set of 1481 peptides generated by protease digestion of the Drosophila melanogaster proteome using information directly derived from molecular structures. The relationship between peptide structure and the drift time tD was constructed by using partial least squares regression (PLS), least-squares support vector machine (LSSVM) and Gaussian process (GP) coupled with genetic algorithm-variable selection. Among these models, the linear PLS was incapable of capturing all dependences in this peptide system, nonlinear LSSVM and GP methods presented a good statistical performance on reproducing peptide mobility behavior. Moreover, since GP was able to handling both linear and nonlinear-hybrid relationship, it gave a stronger fitting ability and a better predictive power than the LSSVM. Systematic analysis of the GP model showed that diversified properties contribute remarkable effect to the relationship between the drift time and the peptide structure. Particularly, the structural topological information and charge distribution contribute significantly to the drift time of peptides.
What problem does this paper attempt to address?