Data processing of product ion spectra: Methods to control false discovery rate in compound search results for non-targeted metabolomics

Fumio Matsuda
DOI: https://doi.org/10.1101/2024.06.16.599235
2024-06-17
Abstract:In non-targeted metabolomics utilizing high-resolution mass spectrometry, several database search methods have been used to comprehensively annotate the acquired product ion spectra. Recent advancements in various in silico prediction techniques have facilitated compound searches by scoring the degree of coincidence between a query product ion spectrum and a compound in a compound database. Certain search results may be false positives, thus necessitating a method for controlling the false discovery rate (FDR). This study proposed two methods for controlling the FDR in compound search results. In the pseudo-target decoy method, the FDR can be estimated without creating a separate decoy database by treating such as the positive ion mode spectra as targets and converting the negative ion mode spectra into decoys. Further, the second-rank method uses the score distribution of the second-ranked hits from the compound search as an approximation of the false-positive distribution of the top-ranked hits. The performance of these methods was evaluated by annotating the product ion spectra from MassBank using the SIRIUS 5 CSI:Finger ID scoring method. The results indicated that the second-rank method was closer to the true FDR of 0.05. When applied to the four human metabolomics datasets, the second-rank method provided more conservative results than the pseudo-target-decoy method. These methods enabled the identification of metabolites not present in human metabolome databases. Overall, this study demonstrates the utility of these simple methods for FDR control in non-targeted metabolomics, facilitating more reliable compound identification and the potential discovery of novel metabolites.
Bioinformatics
What problem does this paper attempt to address?