Abstract:BackgroundModern analytical methods in biology and chemistry use separation techniques coupled to sensitive detectors, such as gas chromatography-mass spectrometry (GC-MS) and liquid chromatography-mass spectrometry (LC-MS). These hyphenated methods provide high-dimensional data. Comparing such data manually to find corresponding signals is a laborious task, as each experiment usually consists of thousands of individual scans, each containing hundreds or even thousands of distinct signals. In order to allow for successful identification of metabolites or proteins within such data, especially in the context of metabolomics and proteomics, an accurate alignment and matching of corresponding features between two or more experiments is required. Such a matching algorithm should capture fluctuations in the chromatographic system which lead to non-linear distortions on the time axis, as well as systematic changes in recorded intensities. Many different algorithms for the retention time alignment of GC-MS and LC-MS data have been proposed and published, but all of them focus either on aligning previously extracted peak features or on aligning and comparing the complete raw data containing all available features.ResultsIn this paper we introduce two algorithms for retention time alignment of multiple GC-MS datasets: multiple alignment by bidirectional best hits peak assignment and cluster extension (BIPACE) and center-star multiple alignment by pairwise partitioned dynamic time warping (CeMAPP-DTW). We show how the similarity-based peak group matching method BIPACE may be used for multiple alignment calculation individually and how it can be used as a preprocessing step for the pairwise alignments performed by CeMAPP-DTW. We evaluate the algorithms individually and in combination on a previously published small GC-MS dataset studying the Leishmania parasite and on a larger GC-MS dataset studying grains of wheat (Triticum aestivum).ConclusionsWe have shown that BIPACE achieves very high precision and recall and a very low number of false positive peak assignments on both evaluation datasets. CeMAPP-DTW finds a high number of true positives when executed on its own, but achieves even better results when BIPACE is used to constrain its search space. The source code of both algorithms is included in the OpenSource software framework Maltcms, which is available fromhttp://maltcms.sf.net. The evaluation scripts of the present study are available from the same source.

DeepRTAlign: Toward Accurate Retention Time Alignment for Large Cohort Mass Spectrometry Data Analysis

A Three-Stage Search Strategy Combining Database Reduction and Retention Time Filtering to Improve the Sensitivity of Low-Input and Single-Cell Proteomic Analysis.

DeepRT: deep learning for peptide retention time prediction in proteomics

Graph-based peak alignment algorithms for multiple liquid chromatography-mass spectrometry datasets

Novel Peak Shift Correction Method Based on the Retention Index for Peak Alignment in Untargeted Metabolomics

Insights into predicting small molecule retention times in liquid chromatography using deep learning

Retention Time Trajectory Matching for Peak Identification in Chromatographic Analysis

Joint corresponding feature identification and alignment for multiple LC/MS replicates

Retention time trajectory matching for target compound peak identification in chromatographic analysis

RT-Ensemble Pred: a tool for retention time prediction of metabolites on different LC-MS systems

A new platform for untargeted UHPLC-HRMS data analysis to address the time-shift problem

Combining peak- and chromatogram-based retention time alignment algorithms for multiple chromatography-mass spectrometry datasets

Peak Alignment of Gas Chromatography-Mass Spectrometry Data with Deep Learning

Retention time prediction for small samples based on integrating molecular representations and adaptive network

Novel Strategy for Mining and Identification of Acylcarnitines Using Data-Independent-Acquisition-Based Retention Time Prediction Modeling and Pseudo-Characteristic Fragmentation Ion Matching.

High Tolerance to Instrument Drifts by Differential Chemical Isotope Labeling LC-MS: A Case Study of the Effect of LC Leak in Long-Term Sample Runs on Quantitative Metabolome Analysis.

metabCombiner 2.0: Disparate Multi-Dataset Feature Alignment for LC-MS Metabolomics

Retention Time of Peptides in Liquid Chromatography Is Well Estimated upon Deep Transfer Learning

Enhancing compound confidence in suspect and non-target screening through machine learning-based retention time prediction

Visualization, Quantification, and Alignment of Spectral Drift in Population Scale Untargeted Metabolomics Data

RTM-align: an improved RNA alignment tool with enhanced short sequence performance via post-standardization and fragment alignment