NOREVA: normalization and evaluation of MS-based metabolomics data

Bo Li,Jing Tang,Qingxia Yang,Shuang Li,Xuejiao Cui,Yinghong Li,Yuzong Chen,Weiwei Xue,Xiaofeng Li,Feng Zhu
DOI: https://doi.org/10.1093/nar/gkx449
IF: 14.9
2017-05-19
Nucleic Acids Research
Abstract:Diverse forms of unwanted signal variations in mass spectrometry-based metabolomics data adversely affect the accuracies of metabolic profiling. A variety of normalization methods have been developed for addressing this problem. However, their performances vary greatly and depend heavily on the nature of the studied data. Moreover, given the complexity of the actual data, it is not feasible to assess the performance of methods by single criterion. We therefore developed NOREVA to enable performance evaluation of various normalization methods from multiple perspectives. NOREVA integrated five well-established criteria (each with a distinct underlying theory) to ensure more comprehensive evaluation than any single criterion. It provided the most complete set of the available normalization methods, with unique features of removing overall unwanted variations based on quality control metabolites and allowing quality control samples based correction sequentially followed by data normalization. The originality of NOREVA and the reliability of its algorithms were extensively validated by case studies on five benchmark datasets. In sum, NOREVA is distinguished for its capability of identifying the well performed normalization method by taking multiple criteria into consideration and can be an indispensable complement to other available tools. NOREVA can be freely accessed at http://server.idrb.cqu.edu.cn/noreva/.
biochemistry & molecular biology
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the impact of data signal fluctuations due to experimental and biological variations on the accuracy of metabolite analysis in mass - spectrometry - based metabolomics data. Specifically, these variations can significantly affect the identification of metabolite profiles, thereby affecting the effectiveness of metabolomics analysis. To meet this challenge, the authors developed a tool named NOREVA, which aims to evaluate the performance of different data normalization methods from multiple perspectives in order to select the method most suitable for a specific data set. NOREVA integrates five established evaluation criteria, each with its own unique theoretical basis, to ensure a more comprehensive evaluation of normalization methods than any single criterion. These criteria include: 1. **Ability to reduce intra - sample inter - group variation**: The effect of the method is measured by evaluating the variation between different samples. 2. **Impact on differential metabolite analysis**: Whether the method can effectively distinguish different groups is judged by evaluating the P - value distribution and clustering heatmap. 3. **Consistency in identifying metabolite markers in different data sets**: The degree of overlap of metabolite markers identified in different data partitions is quantified by defining a consistency score. 4. **Impact on classification accuracy**: It is measured by constructing a support vector machine model and evaluating the area under the receiver operating characteristic curve (AUC). 5. **Correspondence level between normalized data and reference data**: It is evaluated by comparing the log - fold - changes (logFCs) of the normalized data and the reference data. NOREVA not only provides multiple normalization methods but also allows users to comprehensively evaluate the performance of each method according to the above five criteria, thereby providing valuable guidance for metabolomics data analysis.