scMusketeers: Addressing imbalanced cell type annotation and batch effect reduction with a modular autoencoder

Antoine Collin,Simon J Pelletier,Morgane Fierville,Arnaud Droit,Frederic Precioso,Christophe Becavin,Pascal BARBRY
DOI: https://doi.org/10.1101/2024.12.15.628538
2024-12-17
Abstract:The increasing number of single-cell gene expression atlases available represent a potential revolution in understanding physio-pathological processes. To fully leverage this single-cell revolution, we need to enhance data integration and cell annotation strategies, with a particular emphasis on addressing the challenges posed by imbalanced cell type proportions and substantial batch effects. scMusketeers, a deep learning model, optimizes the latent data representation and solves all at once these challenges. scMusketeers features three neural modules: (1) an autoencoder for noise and dimensionality reductions; (2) a focal loss classifier to enhance rare cell type predictions; and (3) an adversarial domain adaptation (DANN) module for batch effect correction. Benchmarking against state-of-the-art tools, including the UCE foundation model, showed that scMusketeers performs on par or better, particularly in identifying rare cell types. It also allows to transfer cell labels from single-cell RNA sequencing to spatial transcriptomics. With its modular and adaptable design, scMusketeers offers a versatile framework that can be generalized to other large-scale biological projects requiring deep learning approaches, establishing itself as a valuable tool for single-cell data integration and analysis.
Biology
What problem does this paper attempt to address?