Five Easy Metrics of Data Quality for LC–MS-Based Global Metabolomics

Xinyu Zhang,Jiyang Dong,Daniel Raftery
DOI: https://doi.org/10.1021/acs.analchem.0c01493
IF: 7.4
2020-08-31
Analytical Chemistry
Abstract:Data quality in global metabolomics is of great importance for biomarker discovery and system biology studies. However, comprehensive metrics and methods to evaluate and compare the data quality of global metabolomics data sets are lacking. In this work, we combine newly developed metrics, along with well-known measures, to comprehensively and quantitatively characterize the data quality across two similar liquid chromatography coupled with mass spectrometry (LC–MS) platforms, with the goal of providing an efficient and improved ability to evaluate the data quality in global metabolite profiling experiments. A pooled human serum sample was run 50 times on two high-resolution LC-QTOF-MS platforms to provide profile and centroid MS data. These data were processed using Progenesis QI software and then analyzed using five important data quality measures, including retention time drift, the number of compounds detected, missing values, and MS reproducibility (2 measures). The detected compounds were fit to a γ distribution versus compound abundance, which was normalized to allow comparison of different platforms. To evaluate missing values, characteristic curves were obtained by plotting the compound detection percentage versus extraction frequency. To characterize reproducibility, the accumulative coefficient of variation (CV) versus the percentage of total compounds detected and intraclass correlation coefficient (ICC) versus compound abundance were investigated. Key findings include significantly better performance using profile mode data compared to centroid mode as well quantitatively better performance from the newer, higher resolution instrument. A summary table of results gives a snapshot of the experimental results and provides a template to evaluate the global metabolite profiling workflow. In total, these measures give a good overall view of data quality in global profiling and allow comparisons of data acquisition strategies and platforms as well as optimization of parameters.The Supporting Information is available free of charge at <a class="ext-link" href="/doi/10.1021/acs.analchem.0c01493?goto=supporting-info">https://pubs.acs.org/doi/10.1021/acs.analchem.0c01493</a>.Flow chart to process the raw LC–MS data (Figure S1), detected compounds and missing values versus compound abundance (Figure S2), the Pearson correlation coefficient versus compound abundance (Figure S3), missing-value performance for the reduced number of samples (Figure S4), detected compounds and missing values versus compound abundance for five QCs (Figure S5), the percentage of compounds versus CV (Figure S6), the 3-D plot of same versus log<sub>10</sub>(abundance) for five QCs, ICC values versus the percentage of compounds, the 3-D plot of same versus CV for five QCs (Figure S7), detected compounds and missing values versus compound abundance for five QCs (Figure S8), and the Pearson correlation coefficient versus compound abundance for five QCs (Figure S9) (<a class="ext-link" href="/doi/suppl/10.1021/acs.analchem.0c01493/suppl_file/ac0c01493_si_001.pdf">PDF</a>)This article has not yet been cited by other publications.
chemistry, analytical
What problem does this paper attempt to address?