Deep Embedded Multi-View Clustering via Jointly Learning Latent Representations and Graphs

Zongmo Huang,Yazhou Ren,Xiaorong Pu,Lifang He
DOI: https://doi.org/10.48550/arXiv.2205.03803
2022-05-08
Abstract:With the representation learning capability of the deep learning models, deep embedded multi-view clustering (MVC) achieves impressive performance in many scenarios and has become increasingly popular in recent years. Although great progress has been made in this field, most existing methods merely focus on learning the latent representations and ignore that learning the latent graph of nodes also provides available information for the clustering task. To address this issue, in this paper we propose Deep Embedded Multi-view Clustering via Jointly Learning Latent Representations and Graphs (DMVCJ), which utilizes the latent graphs to promote the performance of deep embedded MVC models from two aspects. Firstly, by learning the latent graphs and feature representations jointly, the graph convolution network (GCN) technique becomes available for our model. With the capability of GCN in exploiting the information from both graphs and features, the clustering performance of our model is significantly promoted. Secondly, based on the adjacency relations of nodes shown in the latent graphs, we design a sample-weighting strategy to alleviate the noisy issue, and further improve the effectiveness and robustness of the model. Experimental results on different types of real-world multi-view datasets demonstrate the effectiveness of DMVCJ.
Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to perform more effective clustering in multi - view data. Specifically, although existing deep - embedded multi - view clustering (MVC) methods perform well in many scenarios, most of them only focus on learning latent representations and ignore that learning the latent graph structure of nodes can also provide useful information for the clustering task. Therefore, this paper proposes a new method - deep - embedded multi - view clustering by jointly learning latent representations and graph structures (DMVCJ), aiming to use the latent graph structure to improve the performance of the deep - embedded MVC model. ### Main contributions: 1. **Novel deep - embedded multi - view clustering network**: This method can jointly learn the latent graph structure and feature representations without explicit graph data, thereby improving the clustering performance of multi - view data. 2. **Sample weighting strategy based on node in - degree**: A simple and effective sample weighting method is designed to reduce the influence of noisy samples by the node in - degree in the latent adjacency graph. 3. **Experimental verification**: The experimental results on different types of real - world multi - view datasets prove the effectiveness of this algorithm. ### Method overview: - **Representation - learning autoencoder**: Use multiple autoencoders to initialize the node latent representations of each view. - **Self - supervised GCN module**: Dynamically construct an adjacency graph on the latent representations of each view and enhance the latent representations through graph convolution. - **Global weight calculation module**: Calculate the global importance weights of samples based on the adjacency graphs of multiple views to reduce the influence of noisy samples. - **Embedded clustering layer**: Generate global pseudo - labels through the weighted k - means algorithm and optimize the clustering loss function to further improve the model's representation ability. ### Experimental results: - **Datasets**: BDGP, handwritten digits, Reuters. - **Evaluation metrics**: Accuracy (ACC), normalized mutual information (NMI), adjusted Rand index (ARI). - **Comparison methods**: Including traditional single - view clustering methods (such as KMeans, spectral clustering) and the latest multi - view clustering methods (such as MVKKM, MLAN, AMVCD, GMC, SAMVC, DEMVC, SDMVC). The experimental results show that DMVCJ achieves better performance than other methods on all three multi - view datasets, especially on the BDGP and handwritten digits datasets. ### Conclusion: By introducing the self - supervised GCN module and the sample weighting strategy based on node in - degree, DMVCJ can effectively use the latent graph structure information to improve the clustering performance of multi - view data. Future work can consider learning weights for different views to further improve the model.