Model-Aware Contrastive Learning: Towards Escaping the Dilemmas

Zizheng Huang,Haoxing Chen,Ziqi Wen,Chao Zhang,Huaxiong Li,Bo Wang,Chunlin Chen
DOI: https://doi.org/10.48550/arXiv.2207.07874
2023-06-11
Abstract:Contrastive learning (CL) continuously achieves significant breakthroughs across multiple domains. However, the most common InfoNCE-based methods suffer from some dilemmas, such as \textit{uniformity-tolerance dilemma} (UTD) and \textit{gradient reduction}, both of which are related to a $\mathcal{P}_{ij}$ term. It has been identified that UTD can lead to unexpected performance degradation. We argue that the fixity of temperature is to blame for UTD. To tackle this challenge, we enrich the CL loss family by presenting a Model-Aware Contrastive Learning (MACL) strategy, whose temperature is adaptive to the magnitude of alignment that reflects the basic confidence of the instance discrimination task, then enables CL loss to adjust the penalty strength for hard negatives adaptively. Regarding another dilemma, the gradient reduction issue, we derive the limits of an involved gradient scaling factor, which allows us to explain from a unified perspective why some recent approaches are effective with fewer negative samples, and summarily present a gradient reweighting to escape this dilemma. Extensive remarkable empirical results in vision, sentence, and graph modality validate our approach's general improvement for representation learning and downstream tasks.
Machine Learning
What problem does this paper attempt to address?
This paper attempts to solve two main problems in contrastive learning (CL): 1. **Uniformity - Tolerance Dilemma (UTD)**: - **Problem description**: In contrastive learning, the temperature parameter \(\tau\) has an important influence on the penalty intensity of negative samples. A smaller \(\tau\) will lead to an enhanced uniformity in the embedding space, but a reduced tolerance for false negatives (FNs); a larger \(\tau\) is helpful for exploring potential semantic relationships, but is not conducive to learning separable information features. - **Solution**: The paper proposes a model - aware temperature strategy, enabling the temperature parameter to be adaptively adjusted according to the alignment degree of positive sample pairs. Specifically, when the model is under - trained, a smaller temperature parameter is used to improve the uniformity of the embedding space; when the model is well - trained, a larger temperature parameter is used to increase the tolerance for potential semantic relationships. 2. **Gradient Reduction Dilemma**: - **Problem description**: In contrastive learning, when the number of negative samples \(K\) is small, the value of the gradient scaling factor \(W_i\) will be significantly reduced, resulting in a gradient reduction, which hinders the learning effect of the model, especially in low - precision floating - point training. - **Solution**: By analyzing the properties of the gradient scaling factor \(W_i\), the paper proposes a reweighting method. This method re - adjusts the loss function by introducing an upper bound \(V_i\), so that the problem of gradient reduction can be effectively avoided even when the number of negative samples is small. Through these two improvements, the paper aims to improve the representation learning performance of contrastive learning in different modality tasks and solve the dilemmas in existing methods. Experimental results show that the proposed model - aware contrastive learning (MACL) strategy has achieved significant performance improvements on multiple benchmark datasets such as image, sentence, and graph representation learning.