The Effect of Data Integration on LC-MS-based Metabolomics Data: Evaluation on the Comparative Classification Capacities

Xuejiao Cui,Xiaoyu Zhang,Feng Zhu
DOI: https://doi.org/10.1088/1755-1315/252/3/032166
2019-01-01
IOP Conference Series Earth and Environmental Science
Abstract:Data from large-scale LC-MS based metabolomics experiments are generally collected over long periods varying from months to years and has to be divided into several batches, which means for such studies data integration is essential to combine them into one large dataset for data-processing and statistical analysis. This study aims to evaluate the performance of the direct data merge strategy by comparing the performance of classification capacity in direct data merge, result integration and single experiments. Classification capacity of each model is evaluated by the receiver operating characteristic (ROC) analysis together with the measurement of the area under the curve (AUC) based on the Support Vector Machine (SVM) applied on both training and testing datasets together with the biomarkers identified by Student's t-test (p-value <0.05). Finally, direct data merge was found to outperform both result integration and single experiment as assessed in this study. In sum, this study shows the classifying accuracy of direct data merge in metabolomics profiling, which gives critical information in data integration in current metabolomics studies.
What problem does this paper attempt to address?