Explicit Mutual Information Maximization for Self-Supervised Learning

Lele Chang,Peilin Liu,Qinghai Guo,Fei Wen
2024-09-13
Abstract:Recently, self-supervised learning (SSL) has been extensively studied. Theoretically, mutual information maximization (MIM) is an optimal criterion for SSL, with a strong theoretical foundation in information theory. However, it is difficult to directly apply MIM in SSL since the data distribution is not analytically available in applications. In practice, many existing methods can be viewed as approximate implementations of the MIM criterion. This work shows that, based on the invariance property of MI, explicit MI maximization can be applied to SSL under a generic distribution assumption, i.e., a relaxed condition of the data distribution. We further illustrate this by analyzing the generalized Gaussian distribution. Based on this result, we derive a loss function based on the MIM criterion using only second-order statistics. We implement the new loss for SSL and demonstrate its effectiveness via extensive experiments.
Computer Vision and Pattern Recognition,Machine Learning
What problem does this paper attempt to address?
This paper attempts to solve a key problem in self - supervised learning (SSL), namely how to effectively maximize mutual information (MI). Specifically, the main objectives of the paper are: 1. **Construct an MI optimization objective suitable for SSL**: In theory, mutual information maximization (MIM) is an ideal criterion for self - supervised learning because it can capture the nonlinear statistical dependence relationships between variables. However, directly applying MIM is very difficult in practice because the data distribution is usually unknown or cannot be analytically expressed. 2. **Propose an explicit MI maximization method based on second - order statistics**: The paper utilizes the invariance property of mutual information and shows that an explicit MI maximization objective can be applied under general distribution assumptions. This enables the calculation of MI through second - order statistics even when the specific data distribution is unknown, thus providing a new perspective and method for SSL. 3. **Solve the difficult problem of high - dimensional matrix determinant calculation**: Since directly optimizing MI involves the determinant calculation of high - dimensional covariance matrices, this process is computationally complex and unstable. To this end, the paper proposes methods of reformulation and approximation to ensure the stability and efficiency of the optimization process. 4. **Verify the effectiveness of the new method**: Through experiments on standard datasets such as CIFAR - 10/100 and ImageNet - 100/1K, the paper demonstrates the effectiveness of the proposed method and compares it with existing advanced methods, proving its superior performance. ### Core contributions of the paper - **A new perspective on SSL objective design**: Starting from the invariance of mutual information, an explicit MI optimization objective based on second - order statistics is proposed. - **Efficient implementation strategies**: In response to the difficult problem of high - dimensional matrix determinant calculation, methods of reformulation and approximation are proposed to make it suitable for end - to - end training. - **Experimental verification of effectiveness**: Through extensive experiments, the superior performance of this method on multiple benchmark datasets is verified. ### Summary This paper solves the problem that mutual information is difficult to be directly applied in self - supervised learning by introducing an explicit optimization objective based on mutual information, and achieves efficient and stable training through a series of optimization strategies. The experimental results show that this method performs well on multiple tasks and has important theoretical and practical significance.