Multi-view Semantic Consistency based Information Bottleneck for Clustering

Wenbiao Yan,Jihua Zhu,Yiyang Zhou,Yifei Wang,Qinghai Zheng
2023-02-28
Abstract:Multi-view clustering can make use of multi-source information for unsupervised clustering. Most existing methods focus on learning a fused representation matrix, while ignoring the influence of private information and noise. To address this limitation, we introduce a novel Multi-view Semantic Consistency based Information Bottleneck for clustering (MSCIB). Specifically, MSCIB pursues semantic consistency to improve the learning process of information bottleneck for different views. It conducts the alignment operation of multiple views in the semantic space and jointly achieves the valuable consistent information of multi-view data. In this way, the learned semantic consistency from multi-view data can improve the information bottleneck to more exactly distinguish the consistent information and learn a unified feature representation with more discriminative consistent information for clustering. Experiments on various types of multi-view datasets show that MSCIB achieves state-of-the-art performance.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is that in multi - view clustering tasks, most of the existing methods focus on learning a fused representation matrix while ignoring the influence of private information and noise. Specifically, each view in multi - view data contains common semantic information and private information. The latter is meaningless or even misleading for clustering tasks and may interfere with the quality of the fused features, resulting in poor clustering effects. To overcome this limitation, the author proposes a new framework - Information Bottleneck based on Multi - view Semantic Consistency (MSCIB), aiming to improve the learning process of the information bottleneck through semantic consistency, so as to more accurately distinguish consistent information and learn a more discriminative consistent feature representation for clustering. ### Main contributions: 1. **Propose a new deep multi - view clustering method**: This method removes redundant information from the original features while mining the common information of multi - view features, improving the interpretability of the feature representation. 2. **Use semantic consistency to guide the information bottleneck theory**: By introducing semantic consistency loss constraints, the distance between similar view representations is reduced, thereby learning more discriminative feature representations. 3. **Verify the effectiveness of MSCIB through extensive experiments**: The experimental results show that MSCIB outperforms existing multi - view clustering methods on multiple datasets. ### Method overview: - **Feature reconstruction**: Use variational auto - encoder (VAE) to encode and decode the features of each view, and ensure that the network can thoroughly learn the original feature information through the reconstruction loss \( L_{\text{Rec}} \). - **Information bottleneck**: Utilize the information bottleneck theory to learn a compact and discriminative feature representation by maximizing the consistent information between views and minimizing the redundant information. - **Multi - view semantic consistency**: Learn the semantic information of each view through a multi - layer perceptron (MLP) network and enhance the consistency between different views through the contrast loss \( L_{\text{Sem}} \). ### Experimental results: - **Clustering performance**: On multiple datasets, the clustering performance of MSCIB is better than that of other existing multi - view clustering methods, especially on large - scale datasets and noisy datasets. - **Visualization analysis**: Through t - SNE visualization, it is shown that as the training progresses, the clustering structure of the consistent representation matrix \( Z \) gradually becomes clearer. - **Parameter sensitivity analysis**: The experimental results show that MSCIB is insensitive to the hyperparameters \( \lambda_1 \) and \( \lambda_2 \) and the temperature coefficient \( \tau \), and has good robustness. ### Conclusion: By introducing the information bottleneck theory and semantic consistency, MSCIB can effectively remove redundant information in multi - view clustering tasks and learn more discriminative consistent feature representations, thereby significantly improving clustering performance.