A robust model for cell type-specific interindividual variation in single-cell RNA sequencing data

Minhui Chen,Andy Dahl
DOI: https://doi.org/10.1038/s41467-024-49242-9
IF: 16.6
2024-06-21
Nature Communications
Abstract:Single-cell RNA sequencing (scRNA-seq) has been widely used to characterize cell types based on their average gene expression profiles. However, most studies do not consider cell type-specific variation across donors. Modelling this cell type-specific inter-individual variation could help elucidate cell type-specific biology and inform genes and cell types underlying complex traits. We therefore develop a new model to detect and quantify cell type-specific variation across individuals called CTMM (Cell Type-specific linear Mixed Model). We use extensive simulations to show that CTMM is powerful and unbiased in realistic settings. We also derive calibrated tests for cell type-specific interindividual variation, which is challenging given the modest sample sizes in scRNA-seq. We apply CTMM to scRNA-seq data from human induced pluripotent stem cells to characterize the transcriptomic variation across donors as cells differentiate into endoderm. We find that almost 100% of transcriptome-wide variability between donors is differentiation stage-specific. CTMM also identifies individual genes with statistically significant stage-specific variability across samples, including 85 genes that do not have significant stage-specific mean expression. Finally, we extend CTMM to partition interindividual covariance between stages, which recapitulates the overall differentiation trajectory. Overall, CTMM is a powerful tool to illuminate cell type-specific biology in scRNA-seq.
multidisciplinary sciences
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the modeling and quantification of individual differences between specific cell types in single - cell RNA sequencing (scRNA - seq) data. Most existing studies mainly focus on characterizing cell types based on the average gene expression profile, ignoring the variation of cell - type specificity among different donors. This inter - individual variation of cell - type specificity is crucial for revealing the biological characteristics of cell types and understanding the genes and cell types behind complex traits. Therefore, the authors developed a new model - the cell - type - specific linear mixed model (CTMM) - aiming to detect and quantify cell - type - specific variation across individuals. Specifically, the goals of the paper include: 1. **Develop the CTMM model**: To detect and quantify the inter - individual variation of cell - type specificity in scRNA - seq data. 2. **Evaluate the performance of CTMM**: Through extensive simulation experiments, verify the effectiveness and unbiasedness of CTMM in real - world scenarios. 3. **Apply CTMM**: Apply CTMM to the data of human induced pluripotent stem cells (iPSCs) differentiating into endoderm cells to characterize the transcriptomic variation among different donors. 4. **Expand CTMM**: Further expand CTMM to partition the covariance between individuals among different developmental stages, thereby reproducing the overall differentiation trajectory. Through these goals, the paper aims to provide a powerful tool for elucidating the biological characteristics of cell - type specificity in scRNA - seq data.