Attention-based deep clustering method for scRNA-seq cell type identification
Shenghao Li,Hui Guo,Simai Zhang,Yizhou Li,Menglong Li
DOI: https://doi.org/10.1371/journal.pcbi.1011641
2023-11-11
PLoS Computational Biology
Abstract:Single-cell sequencing (scRNA-seq) technology provides higher resolution of cellular differences than bulk RNA sequencing and reveals the heterogeneity in biological research. The analysis of scRNA-seq datasets is premised on the subpopulation assignment. When an appropriate reference is not available, such as specific marker genes and single-cell reference atlas, unsupervised clustering approaches become the predominant option. However, the inherent sparsity and high-dimensionality of scRNA-seq datasets pose specific analytical challenges to traditional clustering methods. Therefore, a various deep learning-based methods have been proposed to address these challenges. As each method improves partially, a comprehensive method needs to be proposed. In this article, we propose a novel scRNA-seq data clustering method named AttentionAE-sc (Attention fusion AutoEncoder for single-cell). Two different scRNA-seq clustering strategies are combined through an attention mechanism, that include zero-inflated negative binomial (ZINB)-based methods dealing with the impact of dropout events and graph autoencoder (GAE)-based methods relying on information from neighbors to guide the dimension reduction. Based on an iterative fusion between denoising and topological embeddings, AttentionAE-sc can easily acquire clustering-friendly cell representations that similar cells are closer in the hidden embedding. Compared with several state-of-art baseline methods, AttentionAE-sc demonstrated excellent clustering performance on 16 real scRNA-seq datasets without the need to specify the number of groups. Additionally, AttentionAE-sc learned improved cell representations and exhibited enhanced stability and robustness. Furthermore, AttentionAE-sc achieved remarkable identification in a breast cancer single-cell atlas dataset and provided valuable insights into the heterogeneity among different cell subtypes. Single-cell sequencing (scRNA-seq) has been widely used in numerous biological studies to reveal heterogeneity at the cellular level. Accurate cell type identification serves as the foundation for scRNA-seq data analysis, and unsupervised cluster analysis is commonly employed when an appropriate reference is not available. However, the inherent sparsity and high-dimensionality of scRNA-seq datasets pose specific analytical challenges to traditional clustering methods. To address this, we propose a novel scRNA-seq data clustering method named AttentionAE-sc (Attention fusion AutoEncoder for single-cell). By integrating denoising representation learning and cluster-friendly representation learning through an attention mechanism, AttentionAE-sc demonstrated outstanding performance in the evaluation phase. Firstly, when compared to benchmark methods on the real scRNA-seq datasets, AttentionAE-sc consistently achieved superior external and internal clustering evaluation metrics. Secondly, AttentionAE-sc exhibited robustness and stability across various experimental conditions. Lastly, AttentionAE-sc not only delivered excellent clustering results but also unveiled potential biological insights on a breast cancer single-cell atlas dataset.
biochemical research methods,mathematical & computational biology