Single-cell Multi-view Clustering via Community Detection with Unknown Number of Clusters

Dayu Hu,Zhibin Dong,Ke Liang,Jun Wang,Siwei Wang,Xinwang Liu
2023-11-28
Abstract:Single-cell multi-view clustering enables the exploration of cellular heterogeneity within the same cell from different views. Despite the development of several multi-view clustering methods, two primary challenges persist. Firstly, most existing methods treat the information from both single-cell RNA (scRNA) and single-cell Assay of Transposase Accessible Chromatin (scATAC) views as equally significant, overlooking the substantial disparity in data richness between the two views. This oversight frequently leads to a degradation in overall performance. Additionally, the majority of clustering methods necessitate manual specification of the number of clusters by users. However, for biologists dealing with cell data, precisely determining the number of distinct cell types poses a formidable challenge. To this end, we introduce scUNC, an innovative multi-view clustering approach tailored for single-cell data, which seamlessly integrates information from different views without the need for a predefined number of clusters. The scUNC method comprises several steps: initially, it employs a cross-view fusion network to create an effective embedding, which is then utilized to generate initial clusters via community detection. Subsequently, the clusters are automatically merged and optimized until no further clusters can be merged. We conducted a comprehensive evaluation of scUNC using three distinct single-cell datasets. The results underscored that scUNC outperforms the other baseline methods.
Genomics,Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
The main problems that this paper attempts to solve are two key challenges in single - cell multi - view clustering: 1. **Differences in information richness among different views**: Most existing multi - view clustering methods, when dealing with single - cell RNA (scRNA) and single - cell transposase - accessible chromatin assay (scATAC) data, usually consider the information of these views to be equally important. However, in fact, the data in the scRNA view is more abundant than that in the scATAC view. This undifferentiated treatment often leads to a decline in overall clustering performance. 2. **Automatic determination of the number of clusters**: Most existing clustering methods require users to manually specify the number of clusters. For biological researchers, accurately determining the number of different cell types is a very difficult task. Therefore, a method that can automatically determine the number of clusters is needed. To solve these two problems, the authors propose a new multi - view clustering method - scUNC (Single - cell Uncertain Number of Clusters). This method is implemented through the following steps: 1. **Cross - View Fusion Network (CVFN)**: CVFN automatically assigns weights to the scRNA and scATAC views to effectively fuse information from different views, thereby solving the problem of differences in information richness. 2. **Community detection**: Initial clusters are generated through community detection instead of using the traditional k - means method. Community detection can better capture the inter - relationships between cells. 3. **Iterative merging**: The dip - test statistical method is used to evaluate and merge similar clusters until no more merging is possible. This process does not require users to manually specify the number of clusters. Through these innovations, scUNC has been comprehensively experimentally evaluated on three different single - cell multi - view datasets, and the results show that scUNC is superior to other baseline methods in clustering performance.