Comparative analysis of MS/MS search algorithms in label-free shotgun proteomics for monitoring host-cell proteins using trapped ion mobility and ddaPASEF

Michel Plisnier,Somar Khalil
DOI: https://doi.org/10.1101/2024.11.03.621185
2024-11-03
Abstract:Host cell proteins (HCPs) are critical quality attributes that can impact the safety, efficacy, and quality of biotherapeutics. Label-free shotgun proteomics is a vital approach for HCP monitoring, yet the choice of tandem mass spectrometry (MS/MS) search algorithms directly influences identification depth and quantification reliability. In this study, six prominent MS/MS search tools, Mascot, MaxQuant, SpectroMine, FragPipe, Byos, and PEAKS, were systematically benchmarked for their performance on complex samples spiked with isotopically labeled proteins from Chinese hamster ovary cells, using trapped ion mobility spectrometry and parallel accumulation-serial fragmentation in data-dependent acquisition mode. Key performance metrics, including peptide and protein identifications, data extraction precision, fold-change accuracy, linearity, and measurement trueness, were evaluated. A Bayesian modeling framework with Hamiltonian Monte Carlo sampling was employed to robustly estimate fold-change means and variances, alongside local false discovery rates through posterior probability calibration. Bayesian decision theory, implemented via expected utility maximization, was used to balance accuracy against posterior uncertainty, providing a probabilistic assessment of each tools performance. Through this cumulative analysis, variability across tools was observed: some excelled in identification sensitivity and protein coverage, others in quantitative accuracy with minimal bias, and a few offered balanced performance across metrics. This study establishes a rigorous, data-driven framework for tool benchmarking, delivering insights for selecting MS/MS tools suited to HCP monitoring in biopharmaceutical development.
Biochemistry
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the performance differences of different tandem mass spectrometry (MS/MS) search algorithms in monitoring host cell proteins (HCPs) in label - free quantitative proteomics. Specifically, the paper evaluates the performance of six commonly - used MS/MS search tools (Mascot, MaxQuant, SpectroMine, FragPipe, Byos, and PEAKS) in complex samples by comparing them, especially the performance of these tools in peptide and protein identification, data extraction accuracy, fold - change accuracy, linearity, and measurement authenticity. ### Background Host cell proteins (HCPs) are endogenous impurities in the production process of recombinant therapeutic drugs and may affect the safety, efficacy, and quality of biopharmaceuticals. Therefore, strict monitoring and accurate quantification of HCPs are crucial. Label - free quantification (LFQ) has become a powerful tool in shotgun proteomics for identifying and quantifying HCPs, but low - abundance HCPs and a wide dynamic range pose challenges for detection. ### Research Objectives 1. **Evaluate the performance of different MS/MS search tools**: Systematically compare six commonly - used MS/MS search tools to evaluate their identification and quantification capabilities in complex samples. 2. **Optimize HCPs monitoring methods**: Utilize trapped - ion mobility spectrometry (TIMS) and data - dependent acquisition - parallelizable sequential fragmentation (ddaPASEF) techniques to improve the detection sensitivity and coverage of HCPs. 3. **Provide data - driven tool selection recommendations**: Establish a rigorous framework through Bayesian inference and Hamiltonian Monte Carlo (HMC) sampling to evaluate the performance of each tool and provide a selection basis for HCP monitoring in biopharmaceutical development. ### Methods - **Sample preparation**: Use stable isotope - labeled Chinese hamster ovary (CHO) proteins to prepare antigen - drug substance (DS) samples at different concentrations. - **Mass spectrometry analysis**: Perform liquid chromatography - mass spectrometry analysis in TIMS and ddaPASEF modes. - **Data processing**: Use six different MS/MS search tools to process the raw data and evaluate key performance indicators such as peptide and protein identification and data extraction accuracy, fold - change accuracy, etc. - **Statistical analysis**: Use Bayesian inference and HMC sampling to evaluate the performance of each tool and balance accuracy and posterior uncertainty through Bayesian decision theory. ### Results - **Protein and peptide identification**: PEAKS performs excellently in protein identification, while Byos leads in peptide identification. - **Data extraction accuracy**: FragPipe performs best in data extraction accuracy, followed by SpectroMine. - **Fold - change accuracy**: In low - concentration samples, FragPipe, Byos, and SpectroMine perform excellently; in high - concentration samples, SpectroMine has the highest fold - change accuracy. - **Linearity and measurement authenticity**: Byos performs best in linearity and measurement authenticity at the protein level, followed by PEAKS. ### Conclusion Through comprehensive evaluation, the study found that different MS/MS search tools have significant differences in identification sensitivity, quantitative accuracy, and other performance indicators. These results provide a data - driven selection basis for the biopharmaceutical industry and help optimize HCPs monitoring methods.