DCRELM: dual correlation reduction network-based extreme learning machine for single-cell RNA-seq data clustering

Qingyun Gao,Qing Ai
DOI: https://doi.org/10.1038/s41598-024-64217-y
IF: 4.6
2024-06-13
Scientific Reports
Abstract:Single-cell ribonucleic acid sequencing (scRNA-seq) is a high-throughput genomic technique that is utilized to investigate single-cell transcriptomes. Cluster analysis can effectively reveal the heterogeneity and diversity of cells in scRNA-seq data, but existing clustering algorithms struggle with the inherent high dimensionality, noise, and sparsity of scRNA-seq data. To overcome these limitations, we propose a clustering algorithm: the Dual Correlation Reduction network-based Extreme Learning Machine (DCRELM). First, DCRELM obtains the low-dimensional and dense result features of scRNA-seq data in an extreme learning machine (ELM) random mapping space. Second, the ELM graph distortion module is employed to obtain a dual view of the resulting features, effectively enhancing their robustness. Third, the autoencoder fusion module is employed to learn the attributes and structural information of the resulting features, and merge these two types of information to generate consistent latent representations of these features. Fourth, the dual information reduction network is used to filter the redundant information and noise in the dual consistent latent representations. Last, a triplet self-supervised learning mechanism is utilized to further improve the clustering performance. Extensive experiments show that the DCRELM performs well in terms of clustering performance and robustness. The code is available at https://github.com/gaoqingyun-lucky/awesome-DCRELM.
multidisciplinary sciences
What problem does this paper attempt to address?
The paper aims to address several key issues in the clustering of single-cell RNA sequencing (scRNA-seq) data: 1. **High Dimensionality**: scRNA-seq data typically has high dimensionality, making it difficult for clustering algorithms to handle. 2. **Noise**: scRNA-seq data contains a significant amount of noise, especially due to the transcriptional burst effect leading to data sparsity and zero-value issues. 3. **Sparsity**: Due to low RNA capture rates, the data contains a large number of zero or near-zero values. To address these issues, the authors propose a new clustering algorithm called Dual Correlation Reduction Network-based Extreme Learning Machine (DCRELM). This algorithm enhances clustering performance through the following steps: - Using Extreme Learning Machine (ELM) to map high-dimensional sparse data to a low-dimensional dense space. - Enhancing feature robustness with an ELM graph distortion module. - Learning feature attributes and structural information through an autoencoder fusion module, generating consistent latent representations. - Removing redundant information and noise using a dual information reduction network. - Finally, further improving clustering performance with a triplet self-supervised learning mechanism. Experimental results show that DCRELM outperforms several existing advanced methods in clustering performance on multiple real datasets, exhibiting higher NMI, ARI, and F1 scores on most datasets. Additionally, the algorithm effectively identifies different cell subtypes and performs well across datasets of varying scales.