Automatic annotation and visualization tool for mass spectrometry based glycomics.

Guang Xu,Xin Liu,Qing Yan Liu,Jianjun Li,Yanhong Zhou
DOI: https://doi.org/10.1002/rcm.7735
IF: 2.586
2016-01-01
Rapid Communications in Mass Spectrometry
Abstract:RationaleWith the development of glycomics, a large number of glycan structures have been determined by using mass spectrometry (MS)-based techniques. However, most glycan MS data needs to be manually annotated which is time-consuming, unreliable and inaccurate. MethodsHerein we report a tool for automatically annotating and browsing N-glycan masses and isotopic distributions. We first constructed a training dataset using the Consortium for Functional Glycomics database, in conjunction with data preprocessing and filtering by composition matching. In addition, we improved a matching glycan isotope abundance algorithm through identifying potential overlap region and constructing an optimization model so that it can deconvolute the overlapped glycan isotopic clusters. ResultsIn the matching process, if the m/z difference of two detected ions was close to an integer from 1 to 5, the m/z range was considered as a potential overlapped region, from the lower m/z to m/z+5. It was found that there were more than 20 potential overlap regions in each group of data from CHO sample and human testing sample. Because the training dataset was imbalanced, we combined the Supporting Vector Machines (SVMs) algorithm with different sampling techniques, including Synthetic Minority Over-sampling Technique (SMOTE), to classify all potential candidate compositions. The results demonstrated an average of 26.8% increase in annotation sensitivity through the SMOTE-SVMs algorithm. The source code can be obtained from . ConclusionsWe have developed a new tool which facilities high-throughput glycomics research and assists mass spectrometrists in the interpretation and annotation of glycan samples. Copyright (c) 2016 John Wiley & Sons, Ltd.
What problem does this paper attempt to address?