Abstract:Critical in revealing cell heterogeneity and identifying new cell subtypes, cell clustering based on single-cell RNA sequencing (scRNA-seq) is challenging. Due to the high noise, sparsity, and poor annotation of scRNA-seq data, existing state-of-the-art cell clustering methods usually ignore gene functions and gene interactions. In this study, we propose a feature extraction method, named FEGFS, to analyze scRNA-seq data, taking advantage of known gene functions. Specifically, we first derive the functional gene sets based on Gene Ontology (GO) terms and reduce their redundancy by semantic similarity analysis and gene repetitive rate reduction. Then, we apply the kernel principal component analysis to select features on each non-redundant functional gene set, and we combine the selected features (for each functional gene set) together for subsequent clustering analysis. To test the performance of FEGFS, we apply agglomerative hierarchical clustering based on FEGFS and compared it with seven state-of-the-art clustering methods on six real scRNA-seq datasets. For small datasets like Pollen and Goolam, FEGFS outperforms all methods on all four evaluation metrics including adjusted Rand index (ARI), normalized mutual information (NMI), homogeneity score (HOM), and completeness score (COM). For example, the ARIs of FEGFS are 0.955 and 0.910, respectively, on Pollen and Goolam; and those of the second-best method are only 0.938 and 0.910, respectively. For large datasets, FEGFS also outperforms most methods. For example, the ARIs of FEGFS are 0.781 on both Klein and Zeisel, which are higher than those of all other methods but slight lower than those of SC3 (0.798 and 0.807, respectively). Moreover, we demonstrate that CMF-Impute is powerful in reconstructing cell-to-cell and gene-to-gene correlation and in inferring cell lineage trajectories. As for application, take glioma as an example; we demonstrated that our clustering methods could identify important cell clusters related to glioma and also inferred key marker genes related to these cell clusters.

Single-cell RNA-seq data imputation using Feature Propagation

Single-cell RNA sequencing data imputation using bi-level feature propagation

scCGImpute: An Imputation Method for Single-Cell RNA Sequencing Data Based on Similarities between Cells and Relationships among Genes

Imputation method for single-cell RNA-seq data using neural topic model

scTSSR: gene expression recovery for single-cell RNA sequencing using two-side sparse self-representation

Lack of nephritogenicity of systemic activation of the alternate complement pathway.

SmartImpute: A Targeted Imputation Framework for Single-cell Transcriptome Data

Cellular Similarity based Imputation for Single cell RNA Sequencing Data

scGGAN: single-cell RNA-seq imputation by graph-based generative adversarial network

A novel method for single-cell data imputation using subspace regression

GE-Impute: graph embedding-based imputation for single-cell RNA-seq data

scRNMF: An imputation method for single-cell RNA-seq data by robust and non-negative matrix factorization

Collaborative Structure-Preserved Missing Data Imputation for Single-Cell RNA-Seq Clustering

NISC: Neural Network-Imputation for Single-Cell RNA Sequencing and Cell Type Clustering

Imputing dropouts for single-cell RNA sequencing based on multi-objective optimization

A Novel Single-Cell RNA Sequencing Data Feature Extraction Method Based on Gene Function Analysis and Its Applications in Glioma Study

SAE-Impute: imputation for single-cell data via subspace regression and auto-encoders

Single Cell Self-Paced Clustering with Transcriptome Sequencing Data

scINRB: single-cell gene expression imputation with network regularization and bulk RNA-seq data

Robust scRNA-seq Cell Types Identification by Self-Guided Deep Clustering Network

Evaluating Imputation Methods for Single-Cell RNA-seq Data