Structural Characterization and Statistical Modeling of Nanopeptide Collision Cross-Sections in Ion Mobility Spectrometry

Bin Wu,Xiangjun Kong,Zheng Gao,Yuzhu Pan,Yonggang Ren,Yuanchao Li,Qingwu Yang,Fenglin Lv
DOI: https://doi.org/10.1166/jctn.2013.3222
2013-01-01
Journal of Computational and Theoretical Nanoscience
Abstract:The mobile behavior of 162 single-protonated nanopeptides in ion mobility spectrometry (IMS) is modeled and predicted based on two different types of structure characterization methods, i.e., local descriptors and global descriptors. In the procedure, the local descriptors are derived from the principal component analysis (RCA) of 516 physicochemical properties for 20 standard amino acids and, in this way, the amino acid residues composing a nanopeptide sequence are represented in turn using corresponding local descriptors; the global descriptors are determined with the CODESSA protocol, which regards a nanopeptide as a common organic compound and generates more than 200 statistically significant variables to characterize the whole structure characteristics of a nanopeptide ion. Subsequently, the nonlinear support vector machine (SVM) and Gaussian process (GP) as well as linear partial least squares (PLS) regression are employed to correlate the structural parameters of the characterizations with the IMS collision cross-section (1 of these nanopeptides. The obtained quantitative structure-spectrum relationship (QSSR) models are evaluated rigorously and investigated systematically via 10-fold cross-validation and rigorous Monte Carlo cross-validation (MCCV). We also give a comprehensive comparison on the resulting statistics arising from the different combinations of variable types with modeling methods and found that the local descriptors can give the QSSR models with better fitting ability and predictive power, but worse interpretability, than the global descriptors. In addition, since the QSSR modeling using local descriptors is no need the preparation of the minimization structures of nanopeptides, it would be considerably efficient as compared to that based
What problem does this paper attempt to address?