Population-level Integration of Single-Cell Datasets Enables Multi-Scale Analysis Across Samples

Carlo De Donno,Soroor Hediyeh-Zadeh,Amir Ali Moinfar,Marco Wagenstetter,Luke Zappia,Mohammad Lotfollahi,Fabian J. Theis
DOI: https://doi.org/10.1038/s41592-023-02035-2
IF: 48
2023-01-01
Nature Methods
Abstract:The increasing generation of population-level single-cell atlases has the potential to link sample metadata with cellular data. Constructing such references requires integration of heterogeneous cohorts with varying metadata. Here we present single-cell population level integration (scPoli), an open-world learner that incorporates generative models to learn sample and cell representations for data integration, label transfer and reference mapping. We applied scPoli on population-level atlases of lung and peripheral blood mononuclear cells, the latter consisting of 7.8 million cells across 2,375 samples. We demonstrate that scPoli can explain sample-level biological and technical variations using sample embeddings revealing genes associated with batch effects and biological effects. scPoli is further applicable to single-cell sequencing assay for transposase-accessible chromatin and cross-species datasets, offering insights into chromatin accessibility and comparative genomics. We envision scPoli becoming an important tool for population-level single-cell data integration facilitating atlas use but also interpretation by means of multi-scale analyses.
What problem does this paper attempt to address?