3D-MSNet: A point cloud based deep learning model for untargeted feature detection and quantification in profile LC-HRMS data
Ruimin Wang,Miaoshan Lu,Shaowei An,Jinyin Wang,Changbin Yu
DOI: https://doi.org/10.1093/bioinformatics/btad195
IF: 5.8
2023-04-18
Bioinformatics
Abstract:Abstract Motivation Liquid chromatography coupled with high-resolution mass spectrometry (LC-HRMS) is widely used in composition profiling in untargeted metabolomics research. While retaining complete sample information, mass spectrometry (MS) data naturally have the characteristics of high dimensionality, high complexity, and huge data volume. In mainstream quantification methods, none of the existing methods can perform direct three-dimensional analysis on lossless profile MS signals. All software simplifies calculations by dimensionality reduction or lossy grid transformation, ignoring the full three-dimensional signal distribution of mass spectrometry data and resulting in inaccurate feature detection and quantification. Results On the basis that the neural network is effective for high-dimensional data analysis and can discover implicit features from large amounts of complex data, in this work, we propose 3D-MSNet, a novel deep-learning-based model for untargeted feature extraction. 3D-MSNet performs direct feature detection on three-dimensional MS point clouds as an instance segmentation task. After training on a self-annotated 3D feature dataset, we compared our model with 9 popular software (MS-DIAL, MZmine 2, XCMS Online, MarkerView, Compound Discoverer, MaxQuant, Dinosaur, DeepIso, PointIso) on two metabolomics and one proteomics public benchmark datasets. Our 3D-MSNet model outperformed other software with significant improvement in feature detection and quantification accuracy on all evaluation datasets. Furthermore, 3D-MSNet has high feature extraction robustness and can be widely applied to profile MS data acquired with various high-resolution mass spectrometers with various resolutions. Availability 3D-MSNet is open-source and freely available at https://github.com/CSi-Studio/3D-MSNet under a permissive license. Benchmark datasets, training dataset, evaluation methods and results are available at https://doi.org/10.5281/zenodo.6582912 Supplementary information Supplementary data are available at Bioinformatics online.
biochemical research methods,biotechnology & applied microbiology,mathematical & computational biology