The development of machine learning approaches in two-dimensional NMR data interpretation for metabolomics applications

Julie Pollak,Moses Mayonu,Lin Jiang,Bo Wang
DOI: https://doi.org/10.1016/j.ab.2024.115654
2024-08-24
Abstract:Metabolomics has been widely applied in human diseases and environmental science to study the systematic changes of metabolites over diverse types of stimuli. NMR-based metabolomics has been widely used, but the peak overlap problems in the one-dimensional (1D) NMR spectrum could limit the accuracy of quantitative analysis for metabolomics applications. Two-dimensional (2D) NMR has been applied to solve the 1D NMR overlap problem, but the data processing is still challenging. In this study, we built an automatic approach to process the 2D NMR data for quantitative applications using machine learning approaches. Partial least square discriminant analysis (PLS-DA), artificial neural network classification (ANN-DA), gradient boosted trees classification (XGBoost-DA), and artificial deep learning neural network classification (ANNDL-DA) were applied in combination with an automatic peak selection approach. Standard mixtures, sea anemone extracts, and mouse fecal samples were tested to demonstrate the approach. Our results showed that ANN-DA and ANNDL-DA have high accuracy in selecting 2D NMR peaks (around 90 %), which have a high potential application in 2D NMR-based metabolomics quantitively study, while PLS-DA and XGBoost-DA showed limitations in either data variation or overfitting. Our study built an automatic approach to applying 2D NMR data to routine quantitative analysis in metabolomics.
What problem does this paper attempt to address?