Joint Modeling of Cellular Heterogeneity and Condition Effects with scPCA in Single-Cell RNA-Seq

Harald Sager Vohringer,Sascha Dietrich
DOI: https://doi.org/10.1101/2024.09.22.614322
2024-09-25
Abstract:Single-cell RNA sequencing (scRNA-seq) in multi-condition experiments enables the systematic assessment of treatment effects. Analyzing scRNA-seq data relies on linear dimensionality reduction (DR) methods like principal component analysis (PCA). These methods decompose high-dimensional gene expression profiles into tractable factor representations and prototypical gene expression patterns (components), facilitating the study of cell type variation. However, integrating study covariates within linear DR frameworks remains a challenging task. We present scPCA, a flexible DR framework that jointly models cellular heterogeneity and conditioning variables, allowing it to recover an integrated factor representation and reveal transcriptional changes across conditions and components of the decomposition. We show that scPCA extracts an interpretable latent representation by analyzing unstimulated and IFN-beta-treated PBMCs, and showcase that the model may be employed to effectively address batch effects. We examine age-related changes in rodent lung cell populations, uncovering a previously unreported surge in Ccl5 expression in T cells. We illustrate how scPCA may be employed to identify coordinated transcriptional changes across multiple time-points in depolarized visual cortex neurons. Finally, we show that scPCA elucidates transcriptional shifts in CRISPR-Cas9 chordin knockout zebrafish fish single-cell data despite large difference cell abundance across conditions. Since scPCA introduces a general approach to account for conditioning variables in high-dimensional data, it may also be applicable to datasets other than scRNA-seq.
Bioinformatics
What problem does this paper attempt to address?