Abstract:Recently, self-supervised learning (SSL) has been extensively studied. Theoretically, mutual information maximization (MIM) is an optimal criterion for SSL, with a strong theoretical foundation in information theory. However, it is difficult to directly apply MIM in SSL since the data distribution is not analytically available in applications. In practice, many existing methods can be viewed as approximate implementations of the MIM criterion. This work shows that, based on the invariance property of MI, explicit MI maximization can be applied to SSL under a generic distribution assumption, i.e., a relaxed condition of the data distribution. We further illustrate this by analyzing the generalized Gaussian distribution. Based on this result, we derive a loss function based on the MIM criterion using only second-order statistics. We implement the new loss for SSL and demonstrate its effectiveness via extensive experiments.

What problem does this paper attempt to address?

This paper attempts to solve a key problem in self - supervised learning (SSL), namely how to effectively maximize mutual information (MI). Specifically, the main objectives of the paper are: 1. **Construct an MI optimization objective suitable for SSL**: In theory, mutual information maximization (MIM) is an ideal criterion for self - supervised learning because it can capture the nonlinear statistical dependence relationships between variables. However, directly applying MIM is very difficult in practice because the data distribution is usually unknown or cannot be analytically expressed. 2. **Propose an explicit MI maximization method based on second - order statistics**: The paper utilizes the invariance property of mutual information and shows that an explicit MI maximization objective can be applied under general distribution assumptions. This enables the calculation of MI through second - order statistics even when the specific data distribution is unknown, thus providing a new perspective and method for SSL. 3. **Solve the difficult problem of high - dimensional matrix determinant calculation**: Since directly optimizing MI involves the determinant calculation of high - dimensional covariance matrices, this process is computationally complex and unstable. To this end, the paper proposes methods of reformulation and approximation to ensure the stability and efficiency of the optimization process. 4. **Verify the effectiveness of the new method**: Through experiments on standard datasets such as CIFAR - 10/100 and ImageNet - 100/1K, the paper demonstrates the effectiveness of the proposed method and compares it with existing advanced methods, proving its superior performance. ### Core contributions of the paper - **A new perspective on SSL objective design**: Starting from the invariance of mutual information, an explicit MI optimization objective based on second - order statistics is proposed. - **Efficient implementation strategies**: In response to the difficult problem of high - dimensional matrix determinant calculation, methods of reformulation and approximation are proposed to make it suitable for end - to - end training. - **Experimental verification of effectiveness**: Through extensive experiments, the superior performance of this method on multiple benchmark datasets is verified. ### Summary This paper solves the problem that mutual information is difficult to be directly applied in self - supervised learning by introducing an explicit optimization objective based on mutual information, and achieves efficient and stable training through a series of optimization strategies. The experimental results show that this method performs well on multiple tasks and has important theoretical and practical significance.

Explicit Mutual Information Maximization for Self-Supervised Learning

Analysis of High-dimensional Gaussian Labeled-unlabeled Mixture Model via Message-passing Algorithm

More Synergy, Less Redundancy: Exploiting Joint Mutual Information for Self-Supervised Learning

Mutual Information Maximization for Effective Lip Reading

Self-MI: Efficient Multimodal Fusion via Self-Supervised Multi-Task Learning with Auxiliary Mutual Information Maximization

Mutual Information Gradient Estimation for Representation Learning

LSMI-Sinkhorn: Semi-supervised Mutual Information Estimation with Optimal Transport

A Probabilistic Model Behind Self-Supervised Learning

Explicitly Modeling Universality into Self-Supervised Learning

M3MIML: A Maximum Margin Method for Multi-instance Multi-label Learning

Learning Where to Learn in Cross-View Self-Supervised Learning

Siamese Image Modeling for Self-Supervised Vision Representation Learning

The Common Stability Mechanism behind most Self-Supervised Learning Approaches

A robust estimator of mutual information for deep learning interpretability

Self-supervised Learning is More Robust to Dataset Imbalance

Information Flow in Self-Supervised Learning

Matrix Information Theory for Self-Supervised Learning

Online Continual Learning through Mutual Information Maximization.

Stable and Fast Deep Mutual Information Maximization Based on Wasserstein Distance

Preserving domain private information via mutual information maximization

Kernel Masked Image Modeling Through the Lens of Theoretical Understanding