Integration of single-cell proteomic datasets through distinctive proteins in cell clusters

Mehmet Burak Koca,Fatih Erdoğan Sevilgen
DOI: https://doi.org/10.1002/pmic.202300282
PROTEOMICS
Abstract:The use of mass spectrometry and antibody-based sequencing technologies at the single-cell level has led to an increase in single-cell proteomic datasets. Integrating these datasets is crucial to eliminate the batch effect that often arises due to their limited sequencing molecules. Although methods for horizontally integrating high-dimensional single-cell transcriptomic datasets can also be applied to single-cell proteomic datasets, a specialized approach explicitly tailored for low-dimensional proteomic datasets may enhance the integration process. Here, we introduce SCPRO-HI, an algorithm for the horizontal integration of antibody-based single-cell proteomic datasets. It utilizes a hierarchical cell anchoring technique to match cells based on the similarity of distinctive proteins for constituting cell clusters. A novel variational auto-encoder model is employed for correcting batch effects on the protein abundances, eliminating the need for mapping them into a new domain. Moreover, we propose a technique for extending the algorithm to high-dimensional datasets. The performance of the SCPRO-HI algorithm is evaluated using simulated and real-world single-cell proteomic datasets. The findings demonstrate our algorithm outperforms state-of-the-art methods, achieving a 75% higher silhouette score while preserving HVPs 13% better. Furthermore, the algorithm shows competitive performance in transcriptomic datasets, suggesting potential for integrating high-dimensional mass-spectrometry-based proteomic datasets.
What problem does this paper attempt to address?