CTEC: a cross-tabulation ensemble clustering approach for single-cell RNA sequencing data analysis

Liang Wang,Chenyang Hong,Jiangning Song,Jianhua Yao
DOI: https://doi.org/10.1093/bioinformatics/btae130
IF: 5.8
2024-03-29
Bioinformatics
Abstract:Abstract Motivation Cell-type clustering is a crucial first step for single-cell RNA-seq data analysis. However, existing clustering methods often provide different results on cluster assignments with respect to their own data pre-processing, choice of distance metrics, and strategies of feature extraction, thereby limiting their practical applications. Results We propose Cross-Tabulation Ensemble Clustering (CTEC) method that formulates two re-clustering strategies (distribution- and outlier-based) via cross-tabulation. Benchmarking experiments on five scRNA-Seq datasets illustrate that the proposed CTEC method offers significant improvements over the individual clustering methods. Moreover, CTEC-DB outperforms the state-of-the-art ensemble methods for single-cell data clustering, with 45.4% and 17.1% improvement over the single-cell aggregated from ensemble clustering method (SAFE) and the single-cell aggregated clustering via Mixture model ensemble method (SAME), respectively, on the two-method ensemble test. Availability and implementation The source code of the benchmark in this work is available at the GitHub repository https://github.com/LWCHN/CTEC.git.
biochemical research methods,biotechnology & applied microbiology,mathematical & computational biology
What problem does this paper attempt to address?