The Hitchhiker’s Guide to Statistical Analysis of Feature-based Molecular Networks from Non-Targeted Metabolomics Data

Abzer K. Pakkir Shah,Axel Walter,Daniel Petras,Filip Ottosson,Francesco Russo,Marcelo Navarro-Díaz,Judith Boldt,Jarmo-Charles Kalinski,Eftychia E. Kontou,James Elofson,Alexandros Polyzois,Carolina González-Marín,Shane Farrell,Marie R. Aggerbeck,Thapanee Pruksatrakul,Nathan Chan,Yunshu Wang,Magdalena Pöchhacker,Corinna Brungs,Beatriz Cámara,Andrés M. Caraballo-Rodríguez,Andres Cumsille,Fernanda de Oliveira,Kai Dührkop,Yasin El Abiead,Christian Geibel,Lana G. Graves,Martin Hansen,Steffen Heuckeroth,Simon Knoblauch,Anastasiia Kostenko,Mirte CM. Kuijpers,Kevin Mildau,Stilianos Papadopoulos Lambidis,Paulo Wender Portal Gomes,Tilman Schramm,Karoline Steuer-Lodd,Paolo Stincone,Sibgha Tayyab,Giovanni Andrea Vitale,Berenike C. Wagner,Shipei Xing,Marquis T. Yazzie,Simone Zuffa,Martinus de Kruijff,Christine Beemelmanns,Hannes Link,Christoph Mayer,Justin JJ van der Hooft,Tito Damiani,Tomáš Pluskal,Pieter C. Dorrestein,Jan Stanstrup,Robin Schmid,Mingxun Wang,Allegra T. Aron,Madeleine Ernst
DOI: https://doi.org/10.26434/chemrxiv-2023-wwbt0
2023-11-01
Abstract:Feature-Based Molecular Networking (FBMN) is a popular analysis approach for LC-MS/MS-based non-targeted metabolomics data. While processing LC-MS/MS data through FBMN is fairly streamlined, downstream data handling and statistical interrogation is often a key bottleneck. Especially, users new to statistical analysis struggle to effectively handle and analyze complex data matrices. In this protocol, we provide a comprehensive guide for the statistical analysis of FBMN results. We explain the data structure and principles of data clean-up and normalization, as well as uni- and multivariate statistical analysis of FBMN results. We provide explanations and code in two scripting languages (R and Python) as well as the QIIME2 framework for all protocol steps, from data clean-up to statistical analysis. Additionally, the protocol is accompanied by a web application with a graphical user interface (https://fbmn-statsguide.gnps2.org/), to lower the barrier of entry for new users. Together, the protocol, code, and web app provide a complete guide and toolbox for FBMN data integration, clean-up, and advanced statistical analysis, enabling new users to uncover molecular insights from their non-targeted metabolomics data. Our protocol is tailored for the seamless analysis of FBMN results from Global Natural Products Social Molecular Networking (GNPS and GNPS2) and can be adapted to other MS feature detection, annotation, and networking tools.
Chemistry
What problem does this paper attempt to address?
The paper primarily proposes a comprehensive guide for the statistical analysis of Feature-Based Molecular Networking (FBMN) results in untargeted metabolomics data. Specifically, the paper aims to address the following issues: 1. **Statistical Analysis Bottlenecks**: Although FBMN provides a relatively smooth workflow for handling liquid chromatography-tandem mass spectrometry (LC-MS/MS) data, there are still critical bottlenecks in downstream data processing and statistical analysis, especially for users new to statistical analysis. 2. **Management and Analysis of Complex Data Matrices**: Complex multi-layer information requires multiple matrix operations, data cleaning, and normalization steps before univariate and multivariate statistical analysis can be performed. This often requires custom scripts or software tools scattered across different platforms, posing a challenge for new users. 3. **Lack of Comprehensive Guidance**: While some tools are available for individual cleaning and analysis steps, there is a lack of a comprehensive and user-friendly guide that covers the entire process from data preparation of FBMN results to advanced statistical analysis. To address the above issues, the paper proposes the following solutions: - Provides a detailed one-stop guide from feature detection, spectral annotation to data cleaning and statistical analysis. - Includes code examples in R, Python, and the QIIME 2 framework to support each step of the protocol. - Developed a web application with a graphical user interface (https://fbmn-statsguide.gnps2.org/) to lower the entry barrier for new users. In summary, the paper aims to help researchers effectively manage and analyze FBMN results from untargeted metabolomics data by providing a complete toolbox and guide, thereby overcoming challenges in statistical analysis and enabling new users to discover molecular insights from their data.