Abstract:Contrastive learning (CL) continuously achieves significant breakthroughs across multiple domains. However, the most common InfoNCE-based methods suffer from some dilemmas, such as \textit{uniformity-tolerance dilemma} (UTD) and \textit{gradient reduction}, both of which are related to a $\mathcal{P}_{ij}$ term. It has been identified that UTD can lead to unexpected performance degradation. We argue that the fixity of temperature is to blame for UTD. To tackle this challenge, we enrich the CL loss family by presenting a Model-Aware Contrastive Learning (MACL) strategy, whose temperature is adaptive to the magnitude of alignment that reflects the basic confidence of the instance discrimination task, then enables CL loss to adjust the penalty strength for hard negatives adaptively. Regarding another dilemma, the gradient reduction issue, we derive the limits of an involved gradient scaling factor, which allows us to explain from a unified perspective why some recent approaches are effective with fewer negative samples, and summarily present a gradient reweighting to escape this dilemma. Extensive remarkable empirical results in vision, sentence, and graph modality validate our approach's general improvement for representation learning and downstream tasks.

What problem does this paper attempt to address?

This paper attempts to solve two main problems in contrastive learning (CL): 1. **Uniformity - Tolerance Dilemma (UTD)**: - **Problem description**: In contrastive learning, the temperature parameter $\tau$ has an important influence on the penalty intensity of negative samples. A smaller $\tau$ will lead to an enhanced uniformity in the embedding space, but a reduced tolerance for false negatives (FNs); a larger $\tau$ is helpful for exploring potential semantic relationships, but is not conducive to learning separable information features. - **Solution**: The paper proposes a model - aware temperature strategy, enabling the temperature parameter to be adaptively adjusted according to the alignment degree of positive sample pairs. Specifically, when the model is under - trained, a smaller temperature parameter is used to improve the uniformity of the embedding space; when the model is well - trained, a larger temperature parameter is used to increase the tolerance for potential semantic relationships. 2. **Gradient Reduction Dilemma**: - **Problem description**: In contrastive learning, when the number of negative samples $K$ is small, the value of the gradient scaling factor $W_i$ will be significantly reduced, resulting in a gradient reduction, which hinders the learning effect of the model, especially in low - precision floating - point training. - **Solution**: By analyzing the properties of the gradient scaling factor $W_i$, the paper proposes a reweighting method. This method re - adjusts the loss function by introducing an upper bound $V_i$, so that the problem of gradient reduction can be effectively avoided even when the number of negative samples is small. Through these two improvements, the paper aims to improve the representation learning performance of contrastive learning in different modality tasks and solve the dilemmas in existing methods. Experimental results show that the proposed model - aware contrastive learning (MACL) strategy has achieved significant performance improvements on multiple benchmark datasets such as image, sentence, and graph representation learning.

Model-Aware Contrastive Learning: Towards Escaping the Dilemmas

Debiased Graph Contrastive Learning.

Full-Attention Driven Graph Contrastive Learning: with Effective Mutual Information Insight

Learning the Unlearned: Mitigating Feature Suppression in Contrastive Learning

Understanding Contrastive Learning via Distributionally Robust Optimization

Adaptive Multi-head Contrastive Learning

Adversarial Contrastive Learning via Asymmetric InfoNCE.

On the Importance of Contrastive Loss in Multimodal Learning

Imbalance Mitigation for Continual Learning via Knowledge Decoupling and Dual Enhanced Contrastive Learning

Simpler, Faster, Stronger: Breaking The log-K Curse On Contrastive Learners With FlatNCE

Contrastive Attraction and Contrastive Repulsion for Representation Learning

CLCE: An Approach to Refining Cross-Entropy and Contrastive Learning for Optimized Learning Fusion

A New Mechanism for Eliminating Implicit Conflict in Graph Contrastive Learning

Self-Damaging Contrastive Learning

A Unified Framework for Contrastive Learning from a Perspective of Affinity Matrix

Contrastive Learning Via Equivariant Representation

InfoGCL: Information-Aware Graph Contrastive Learning

Understanding and Generalizing Contrastive Learning from the Inverse Optimal Transport Perspective.

Contrastive Learning with Semantic Consistency Constraint

Adaptive Contrastive Learning for Learning Robust Representations under Label Noise.

Balanced Contrastive Learning for Long-Tailed Visual Recognition