Collaborative Structure-Preserved Missing Data Imputation for Single-Cell RNA-Seq Clustering
Hang Gao,Wenjun Shen,Rui Li,Cheng Liu,Si Wu
DOI: https://doi.org/10.1109/tcbb.2024.3404013
2024-10-12
IEEE/ACM Transactions on Computational Biology and Bioinformatics
Abstract:Clustering of the single-cell RNA-seq (scRNA-seq) transcriptome profiles is able to identify cell types, which is beneficial to improve the understanding of disease progression. However, in practice, the single-cell expression data often contains a significant number of missing values as a result of technical variability. Missing data is a critical challenge in scRNA-seq clustering analysis since the unknown value does not reflect the underlying true expression level and makes it difficult to discovering cell types by applying clustering algorithms directly. Various approaches have been developed to overcome missing data issue in scRNA-seq clustering. Most of them recover missing expression values by borrowing observed data from similar cells or synthesizing data via generative adversarial networks. Such that the biologically meaningful cluster structure has not been sufficiently exploited. In this work, we introduce ColImpute, a collaborative structure-preserved missing data imputation approach for the scRNA-seq clustering. Specifically, a cluster structure-preserved imputation module and a subspace clustering module, which respectively perform missing data imputation and cell subtypes identification, are integrated into a unified optimization framework to train the two networks in a collaborative manner. Consequently, the clustering module effectively contributes cluster-structure information to guide the trainning process of the missing data imputation module. Simultaneously, the cluster structure-preserved imputation module reciprocally enhances the performance of the clustering module by generating more precise recovered samples. Promising experimental results show that the proposed method is effective for both the data imputation and the cell types identification.
computer science, interdisciplinary applications,biochemical research methods,mathematics,statistics & probability