Integrated View of Baseline Protein Expression in Human Tissues using public Data Independent Acquisition datasets

Ananth Prakash,Andrew Collins,Liora Vilmovsky,Silvie Fexova,Andrew R. Jones,Juan Antonio Vizcaino
DOI: https://doi.org/10.1101/2024.09.16.613191
2024-09-19
Abstract:The PRIDE database is the largest public repository of mass spectrometry-based proteomics data and currently stores more than 40,000 datasets covering a wide range of organisms, experimental techniques and biological conditions. During the past few years, PRIDE has seen an increase in the amount of submitted Data-Independent Acquisition (DIA) proteomics datasets, in parallel with the trends in the field. This provides an excellent opportunity for large scale data reanalysis and reuse. We have systematically reanalysed 15 public label-free DIA datasets across various healthy human tissues, to provide a state-of-the-art view of the human proteome in baseline conditions (without any perturbations), coming from DIA datasets. We computed baseline protein abundances and compared them across various tissues, samples and datasets. Our second aim was to make a comparison of the protein abundances obtained from previous analyses of human baseline Data-Dependent Acquisition (DDA) datasets. Results were heterogeneous. On one hand, we observed a good correlation across some tissues, especially in liver and colon. On the other hand, weak correlations were however found in others such as lung and pancreas. It is likely this reflects mostly differences the sample preparation and processing for datasets derived from these tissues, as opposed to fundamental differences between DDA and DIA proteomics. The reanalysed results including protein abundances and curated metadata are made available to view and download from the resource Expression Atlas.
Bioinformatics
What problem does this paper attempt to address?