Abstract:Multi-view (or -modality) representation learning aims to understand the relationships between different view representations. Existing methods disentangle multi-view representations into consistent and view-specific representations by introducing strong inductive biases, which can limit their generalization ability. In this paper, we propose a novel multi-view representation disentangling method that aims to go beyond inductive biases, ensuring both interpretability and generalizability of the resulting representations. Our method is based on the observation that discovering multi-view consistency in advance can determine the disentangling information boundary, leading to a decoupled learning objective. We also found that the consistency can be easily extracted by maximizing the transformation invariance and clustering consistency between views. These observations drive us to propose a two-stage framework. In the first stage, we obtain multi-view consistency by training a consistent encoder to produce semantically-consistent representations across views as well as their corresponding pseudo-labels. In the second stage, we disentangle specificity from comprehensive representations by minimizing the upper bound of mutual information between consistent and comprehensive representations. Finally, we reconstruct the original data by concatenating pseudo-labels and view-specific representations. Our experiments on four multi-view datasets demonstrate that our proposed method outperforms 12 comparison methods in terms of clustering and classification performance. The visualization results also show that the extracted consistency and specificity are compact and interpretable. Our code can be found at \url{<a class="link-external link-https" href="https://github.com/Guanzhou-Ke/DMRIB" rel="external noopener nofollow">this https URL</a>}.

Variational Distillation for Multi-View Learning

Disentangled Variational Information Bottleneck for Multiview Representation Learning

Farewell to Mutual Information: Variational Distillation for Cross-Modal Person Re-Identification

Multi-View Information Bottleneck Without Variational Approximation

Uncertainty-Weighted Mutual Distillation for Multi-View Fusion

Rethinking Multi-view Representation Learning via Distilled Disentangling

Deep Variational Multivariate Information Bottleneck -- A Framework for Variational Losses

MV–MR: Multi-Views and Multi-Representations for Self-Supervised Learning and Knowledge Distillation

MV-MR: multi-views and multi-representations for self-supervised learning and knowledge distillation

Learning to Learn with Variational Information Bottleneck for Domain Generalization

Differentiable Information Bottleneck for Deterministic Multi-view Clustering

Towards Better Entity Linking with Multi-View Enhanced Distillation

Disentangling Multi-view Representations Beyond Inductive Bias

Large-Margin Multi-ViewInformation Bottleneck

Multi-View Learning with Incomplete Views

Decoupled representation for multi-view learning

The similarity-consensus regularized multi-view learning for dimension reduction

Information Theory-Guided Heuristic Progressive Multi-View Coding

Instance-wise multi-view representation learning

Towards Consistency and Complementarity: A Multiview Graph Information Bottleneck Approach

Multi-View Representation Learning via Dual Optimal Transportation