Improve accuracy and sensibility in glycan structure prediction by matching glycan isotope abundance.

Guang Xu,Xin Liu,Qing Yan Liu,Yanhong Zhou,Jianjun Li
DOI: https://doi.org/10.1016/j.aca.2012.07.009
IF: 6.911
2012-01-01
Analytica Chimica Acta
Abstract:Mass Spectrometry (MS) is a powerful technique for the determination of glycan structures and is capable of providing qualitative and quantitative information. Recent development in computational method offers an opportunity to use glycan structure databases and de novo algorithms for extracting valuable information from MS or MS/MS data. However, detecting low-intensity peaks that are buried in noisy data sets is still a challenge and an algorithm for accurate prediction and annotation of glycan structures from MS data is highly desirable. The present study describes a novel algorithm for glycan structure prediction by matching glycan isotope abundance (mGIA), which takes isotope masses, abundances, and spacing into account. We constructed a comprehensive database containing 808 glycan compositions and their corresponding isotope abundance. Unlike most previously reported methods, not only did we take into count the m/z values of the peaks but also their corresponding logarithmic Euclidean distance of the calculated and detected isotope vectors. Evaluation against a linear classifier, obtained by training mGIA algorithm with datasets of three different human tissue samples from Consortium for Functional Glycomics (CFG) in association with Support Vector Machine (SVM), was proposed to improve the accuracy of automatic glycan structure annotation. In addition, an effective data preprocessing procedure, including baseline subtraction, smoothing, peak centroiding and composition matching for extracting correct isotope profiles from MS data was incorporated. The algorithm was validated by analyzing the mouse kidney MS data from CFG, resulting in the identification of 6 more glycan compositions than the previous annotation and significant improvement of detection of weaker peaks compared with the algorithm previously reported.
What problem does this paper attempt to address?