scDecorr - Feature decorrelation representation learning with domain adaptation enables self-supervised alignment of multiple single-cell experiments

Ritabrata Sanyal,Yang Xu,Hyojin Kim,Rafael Kramann,Sikander Hayat
DOI: https://doi.org/10.1101/2024.05.17.594763
2024-05-21
Abstract:Single-cell RNA sequencing (scRNA-seq) has revolutionized our understanding of cellular heterogeneity in complex biological systems. However, analyzing and integrating scRNA-seq data poses unique computational challenges due to sparsity, high variability, and technical batch effects. Here, we propose a novel framework called scDecorr for robust representation learning and data integration for scRNA-seq analysis. Our approach leverages the idea of feature decorrelation-based self-supervised learning (SSL) to obtain efficient low-dimensional representations of individual cells without relying on negative samples. By maximizing similarity among distorted embeddings while decorrelating their components, scDecorr captures the biological signature while eliminating technical noise. Furthermore, scDecorr incorporates unsupervised domain adaptation to bridge the gap between batches with different probability distributions, enabling effective integration of scRNA-seq data from diverse sources. Our framework achieves domain-invariant representations by learning cell embeddings independently across domains and employing domain-specific batch normalization. We evaluate scDecorr on a variety of single cell datasets and demonstrate its ability to integrate batches without losing the inherent biological variance, thereby facilitating optimal clustering. The representations generated by scDecorr also exhibit robustness in label transfer tasks, allowing for effective transfer of cell-type labels from reference to query datasets. Overall, scDecorr offers a powerful tool for efficient analysis and integration of large and complex scRNA-seq datasets, advancing our understanding of cellular processes and disease mechanisms. The code is available here \nolinkurl{https://github.com/hayatlab/scdecorr}.
Bioinformatics
What problem does this paper attempt to address?