Abstract:Representation learning constitutes a pivotal cornerstone in contemporary deep learning paradigms, offering a conduit to elucidate distinctive features within the latent space and interpret the deep models. Nevertheless, the inherent complexity of anatomical patterns and the random nature of lesion distribution in medical image segmentation pose significant challenges to the disentanglement of representations and the understanding of salient features. Methods guided by the maximization of mutual information, particularly within the framework of contrastive learning, have demonstrated remarkable success and superiority in decoupling densely intertwined representations. However, the effectiveness of contrastive learning highly depends on the quality of the positive and negative sample pairs, i.e. the unselected average mutual information among multi-views would obstruct the learning strategy so the selection of the views is vital. In this work, we introduce a novel approach predicated on representation distance-based mutual information (MI) maximization for measuring the significance of different views, aiming at conducting more efficient contrastive learning and representation disentanglement. Additionally, we introduce an MI re-ranking strategy for representation selection, benefiting both the continuous MI estimating and representation significance distance measuring. Specifically, we harness multi-view representations extracted from the frequency domain, re-evaluating their significance based on mutual information across varying frequencies, thereby facilitating a multifaceted contrastive learning approach to bolster semantic comprehension. The statistical results under the five metrics demonstrate that our proposed framework proficiently constrains the MI maximization-driven representation selection and steers the multi-view contrastive learning process.

Learning Representations by Maximizing Mutual Information Across Views

Learning Disentangled Representation for Cross-Modal Retrieval with Deep Mutual Information Estimation.

Learning deep representations by mutual information estimation and maximization

Self-supervised Video Representation Learning by Maximizing Mutual Information.

Mutual Information Guided 3D ResNet for Self-Supervised Video Representation Learning.

Information Maximization Clustering Via Multi-View Self-Labelling

Constrained Multiview Representation for Self-supervised Contrastive Learning

Learning Generalizable Visual Representations via Self-Supervised Information Bottleneck.

Clustering by Maximizing Mutual Information Across Views

Multimodal Representation Learning via Maximization of Local Mutual Information

MVEB: Self-Supervised Learning with Multi-View Entropy Bottleneck

Learning Disentangled Representations via Mutual Information Estimation

What makes for good views for contrastive learning

Self-labelling via simultaneous clustering and representation learning

Maximizing Mutual Information Across Feature and Topology Views for Learning Graph Representations

What Makes for Good Views for Contrastive Learning?

Self-Supervised Representation Learning From Multi-Domain Data

Siamese Image Modeling for Self-Supervised Vision Representation Learning

Un-mix: Rethinking Image Mixtures for Unsupervised Visual Representation Learning

Multi-Trusted Cross-Modal Information Bottleneck for 3D Self-Supervised Representation Learning