Abstract:Motivation: Single cell transcriptome sequencing (scRNA-Seq) has become a revolutionary tool to study cellular and molecular processes at single cell resolution. Among existing technologies, the recently developed droplet-based platform enables efficient parallel processing of thousands of single cells with direct counting of transcript copies using Unique Molecular Identifier (UMI). Despite the technology advances, statistical methods and computational tools are still lacking for analyzing droplet-based scRNA-Seq data. Particularly, model-based approaches for clustering large-scale single cell transcriptomic data are still under-explored.Results: We developed DIMM-SC, a Dirichlet Mixture Model for clustering droplet-based Single Cell transcriptomic data. This approach explicitly models UMI count data from scRNA-Seq experiments and characterizes variations across different cell clusters via a Dirichlet mixture prior. We performed comprehensive simulations to evaluate DIMM-SC and compared it with existing clustering methods such as K-means, CellTree and Seurat. In addition, we analyzed public scRNA-Seq datasets with known cluster labels and in-house scRNA-Seq datasets from a study of systemic sclerosis with prior biological knowledge to benchmark and validate DIMM-SC. Both simulation studies and real data applications demonstrated that overall, DIMM-SC achieves substantially improved clustering accuracy and much lower clustering variability compared to other existing clustering methods. More importantly, as a model-based approach, DIMM-SC is able to quantify the clustering uncertainty for each single cell, facilitating rigorous statistical inference and biological interpretations, which are typically unavailable from existing clustering methods.Availability and implementation: DIMM-SC has been implemented in a user-friendly R package with a detailed tutorial available on www.pitt.edu/∼wec47/singlecell.html.Contact: wei.chen@chp.edu or hum@ccf.org.Supplementary information: Supplementary data are available at Bioinformatics online.

Single-Cell Transcriptome Data Clustering via Multinomial Modeling and Adaptive Fuzzy K-Means Algorithm

DIMM-SC: a Dirichlet mixture model for clustering droplet-based single cell transcriptomic data

Deep soft K-means clustering with self-training for single-cell RNA sequence data

Clustering single-cell RNA-seq data with a model-based deep learning approach

Clustering of single-cell multi-omics data with a multimodal deep learning method

scDAC: deep adaptive clustering of single-cell transcriptomic data with coupled autoencoder and Dirichlet process mixture model

Deep Learning for clustering single-cell RNA-seq Data

Self-supervised deep clustering of single-cell RNA-seq data to hierarchically detect rare cell populations

scDFN: enhancing single-cell RNA-seq clustering with deep fusion networks

Denoising adaptive deep clustering with self-attention mechanism on single-cell sequencing data

Effective multi-modal clustering method via skip aggregation network for parallel scRNA-seq and scATAC-seq data

scDFC: A deep fusion clustering method for single-cell RNA-seq data

scASDC: Attention Enhanced Structural Deep Clustering for Single-cell RNA-seq Data

Deep structural clustering for single-cell RNA-seq data jointly through autoencoder and graph neural network

Machine learning and statistical methods for clustering single-cell RNA-sequencing data

Clustering Single-Cell RNA Sequencing Data by Deep Learning Algorithm

scDRMAE: integrating masked autoencoder with residual attention networks to leverage omics feature dependencies for accurate cell clustering

A Fusion Learning Model Based on Deep Learning for Single-Cell RNA Sequencing Data Clustering

Deep learning-based clustering method for single-cell RNA data

A Hybrid Deep Clustering Approach for Robust Cell Type Profiling Using Single-cell RNA-seq Data

Clustering single cell CITE-seq data with a canonical correlation based deep learning method