Abstract:Mass spectrometry (MS)-based proteomics continues to evolve rapidly, opening more and more application areas. The scale of data generated on novel instrumentation and acquisition strategies pose a challenge to bioinformatic analysis. Search engines need to make optimal use of the data for biological discoveries while remaining statistically rigorous, transparent and performant. Here we present alphaDIA, a modular open-source search framework for data independent acquisition (DIA) proteomics. We developed a feature-free identification algorithm particularly suited for detecting patterns in data produced by sensitive time-of-flight instruments. It naturally adapts to novel, more eTicient scan modes that are not yet accessible to previous algorithms. Rigorous benchmarking demonstrates competitive identification and quantification performance. While supporting empirical spectral libraries, we propose a new search strategy named end-to-end transfer learning using fully predicted libraries. This entails continuously optimizing a deep neural network for predicting machine and experiment specific properties, enabling the generic DIA analysis of any post-translational modification (PTM). AlphaDIA provides a high performance and accessible framework running locally or in the cloud, opening DIA analysis to the community.

What problem does this paper attempt to address?

The problem this paper attempts to address is the challenge of complex data processing in Data-Independent Acquisition (DIA) proteomics. Specifically, the authors propose a new framework—alphaDIA, aimed at addressing the shortcomings of existing methods through the following points: 1. **Improving flexibility and performance in data processing**: Existing DIA data processing algorithms typically rely on specific instruments and experimental methods, and most are closed-source software. alphaDIA, on the other hand, is an open-source, modular framework that can flexibly adapt to different types of DIA data, including high-dimensional TOF data. 2. **Feature-free identification algorithm**: Traditional DIA data processing methods usually require feature extraction first, which may lead to information loss. alphaDIA adopts a feature-free approach, performing machine learning directly on the raw signals, thereby better handling noisy data and complex spectra. 3. **End-to-end transfer learning**: alphaDIA introduces an end-to-end transfer learning strategy, utilizing deep neural networks to predict peptide libraries under specific instruments and experimental conditions, thus achieving general DIA analysis for any post-translational modifications (PTM). 4. **Improving quantitative accuracy and depth**: By combining advanced deep learning techniques and efficient search algorithms, alphaDIA can achieve high-precision protein quantification in large-scale datasets and deeply characterize complex protein mixtures. 5. **Supporting multiple instrument platforms**: alphaDIA is not only suitable for TOF detectors but also capable of processing data from different manufacturers and types of mass spectrometers, including Orbitrap and SWATH. In summary, the main goal of this paper is to develop a high-performance, flexible, and open DIA data processing framework to tackle the increasingly complex high-throughput data challenges in modern proteomics research.

AlphaDIA enables End-to-End Transfer Learning for Feature-Free Proteomics

Micro-Data-Independent Acquisition for High-Throughput Proteomics and Sensitive Peptide Mass Spectrum Identification

Deep learning approaches for data-independent acquisition proteomics

2019 Association of Biomolecular Resource Facilities Multi-Laboratory Data-Independent Acquisition Study

MetaLab Platform Enables Comprehensive DDA and DIA Metaproteomics Analysis

A Full Window Data Independent Acquisition Method for Deeper Top-down Proteomics

AlphaPeptDeep: a modular deep learning framework to predict peptide properties for proteomics

Dear-DIAXMBD: Deep Autoencoder Enables Deconvolution of Data-Independent Acquisition Proteomics

AlphaPept: a modern and open framework for MS-based proteomics

DIAmeter: matching peptides to data-independent acquisition mass spectrometry data

In silico spectral libraries by deep learning facilitate data-independent acquisition proteomics

Interrogating data-independent acquisition LC-MS/MS for affinity proteomics

Data-independent acquisition: A milestone and prospect in clinical mass spectrometry-based proteomics

Beta-DIA: Integrating learning-based and function-based feature scores to optimize the proteome profiling of single-shot diaPASEF mass spectrometry data

AIomics: exploring more of the proteome using mass spectral libraries extended by AI

Transformer-based de novo peptide sequencing for data-independent acquisition mass spectrometry

TopDIA: A Software Tool for Top-Down Data-Independent Acquisition Proteomics

DIA-MS2pep: a library-free framework for comprehensive peptide identification from data-independent acquisition data

Robust, Precise, and Deep Proteome Profiling Using a Small Mass Range and Narrow Window Data-Independent-Acquisition Scheme

Data‐Independent Acquisition Mass Spectrometry‐Based Proteomics and Software Tools: A Glimpse in 2020

DIA-NN: neural networks and interference correction enable deep proteome coverage in high throughput