Gradient Boosting Model for Unbalanced Quantitative Mass Spectra Quality Assessment

Tianjun Li,Tong Zhang,Long Chen
DOI: https://doi.org/10.1109/spac.2017.8304311
2017-01-01
Abstract:A method for controlling the quality of isotope labeled mass spectra is described here. In such mass spectra, the profiles of labeled (heavy) and unlabeled (light) peptide pairs provide us valuable information about the studied biological samples in different conditions. The core task of quality control in quantitative LC-MS experiment is to filter out low quality spectra or the peptides with error profiles. The most common used method for this problem is training a classifier for the spectra data to separate it into positive (high quality) and negative (low quality) ones. However, the small number of error profiles always makes the training data dominated by the positive samples, i.e., class imbalance problem. So the Syntheic minority over-sampling technique (SMOTE) is employed to handle the unbalanced data and then applied extreme gradient boosting (Xgboost) model as the classifier. We assessed the different heavy-light peptide ratio samples by the trained Xgboost classifier, and found that the SMOTE Xgboost classifier increases the reliability of peptide ratio estimations significantly.
What problem does this paper attempt to address?