Shared Differential Clustering across Single-cell RNA Sequencing Datasets with the Hierarchical Dirichlet Process

Jinlu Liu,Sara Wade,Natalia Bochkina
2023-12-13
Abstract:Single-cell RNA sequencing (scRNA-seq) is powerful technology that allows researchers to understand gene expression patterns at the single-cell level. However, analysing scRNA-seq data is challenging due to issues and biases in data collection. In this work, we construct an integrated Bayesian model that simultaneously addresses normalization, imputation and batch effects and also nonparametrically clusters cells into groups across multiple datasets. A Gibbs sampler based on a finite-dimensional approximation of the HDP is developed for posterior inference.
Genomics,Methodology
What problem does this paper attempt to address?
### Problems Addressed by the Paper This paper aims to address multiple challenges in the analysis of single-cell RNA sequencing (scRNA-seq) data and proposes a new method to integrate multiple datasets for shared clustering. Specifically: 1. **Gene Expression Pattern Analysis**: - Single-cell RNA sequencing technology enables researchers to understand gene expression patterns at the single-cell level and reveal cellular heterogeneity. 2. **Clustering Analysis**: - Clustering is an important tool in scRNA-seq analysis, used to discover groups of cells with similar gene expression patterns and identify potential cell types. 3. **Integration of Multiple Datasets**: - Integrating multiple scRNA-seq datasets is an urgent challenge. This paper develops a new model that extends clustering methods to appropriately combine inference results across multiple datasets. 4. **Handling Noise and Uncertainty**: - Through a hierarchical Bayesian framework, the model simultaneously addresses normalization issues, handles the inherent noise and uncertainty in scRNA-seq data, and infers cell types. 5. **Experimental Motivation**: - The research is inspired by embryonic cell experimental data, aiming to understand the role of the transcription factor PAX6 in embryonic development, particularly the changes in cell subtypes and their proportions after knocking out this factor. In summary, this paper primarily addresses how to effectively integrate multiple datasets in single-cell RNA sequencing data analysis and, on this basis, conduct research on cell type identification and proportion changes.