Abstract:Multi-view representation learning aims to extract comprehensive information from multiple sources. It has achieved significant success in applications such as video understanding and 3D rendering. However, how to improve the robustness and generalization of multi-view representations from unsupervised and incomplete scenarios remains an open question in this field. In this study, we discovered a positive correlation between the semantic distance of multi-view representations and the tolerance for data corruption. Moreover, we found that the information ratio of consistency and complementarity significantly impacts the performance of discriminative and generative tasks related to multi-view representations. Based on these observations, we propose an end-to-end CLustering-guided cOntrastiVE fusioN (CLOVEN) method, which enhances the robustness and generalization of multi-view representations simultaneously. To balance consistency and complementarity, we design an asymmetric contrastive fusion module. The module first combines all view-specific representations into a comprehensive representation through a scaling fusion layer. Then, the information of the comprehensive representation and view-specific representations is aligned via contrastive learning loss function, resulting in a view-common representation that includes both consistent and complementary information. We prevent the module from learning suboptimal solutions by not allowing information alignment between view-specific representations. We design a clustering-guided module that encourages the aggregation of semantically similar views. This action reduces the semantic distance of the view-common representation. We quantitatively and qualitatively evaluate CLOVEN on five datasets, demonstrating its superiority over 13 other competitive multi-view learning methods in terms of clustering and classification performance. In the data-corrupted scenario, our proposed method resists noise interference better than competitors. Additionally, the visualization demonstrates that CLOVEN succeeds in preserving the intrinsic structure of view-specific representations and improves the compactness of view-common representations. Our code can be found at https://github.com/guanzhou-ke/cloven.

Partially View-aligned Representation Learning Via Cross-view Graph Contrastive Network

Cross-view Graph Contrastive Representation Learning on Partially Aligned Multi-view Data

Nonparametric Clustering-Guided Cross-View Contrastive Learning for Partially View-Aligned Representation Learning

Graph Contrastive Partial Multi-View Clustering

Partial Multi-View Clustering via Meta-Learning and Contrastive Feature Alignment

Anchor-Sharing and Clusterwise Contrastive Network for Multiview Representation Learning

Dynamic Graph Guided Progressive Partial View-Aligned Clustering

A Clustering-guided Contrastive Fusion for Multi-view Representation Learning

Deep Incomplete Multi-view Clustering with Cross-view Partial Sample and Prototype Alignment

Reliable Representations Learning for Incomplete Multi-View Partial Multi-Label Classification

Graph Contrastive Learning with Cross-view Reconstruction

Dual Contrastive Prediction for Incomplete Multi-View Representation Learning

Contrastive and attentive graph learning for multi-view clustering

Global and local combined contrastive learning for multi-view clustering

ACTIVE:Augmentation-Free Graph Contrastive Learning for Partial Multi-View Clustering

Incomplete multi-view clustering via attention-based contrast learning

Composite attention mechanism network for deep contrastive multi-view clustering

Dual Contrast-Driven Deep Multi-View Clustering

Learning Cross-Modal Aligned Representation with Graph Embedding

Cross-View Representation Learning-Based Deep Multiview Clustering With Adaptive Graph Constraint