Abstract:What do different contrastive learning (CL) losses actually optimize for? Although multiple CL methods have demonstrated remarkable representation learning capabilities, the differences in their inner workings remain largely opaque. In this work, we analyse several CL families and prove that, under certain conditions, they admit the same minimisers when optimizing either their batch-level objectives or their expectations asymptotically. In both cases, an intimate connection with the hyperspherical energy minimisation (HEM) problem resurfaces. Drawing inspiration from this, we introduce a novel CL objective, coined Decoupled Hyperspherical Energy Loss (DHEL). DHEL simplifies the problem by decoupling the target hyperspherical energy from the alignment of positive examples while preserving the same theoretical guarantees. Going one step further, we show the same results hold for another relevant CL family, namely kernel contrastive learning (KCL), with the additional advantage of the expected loss being independent of batch size, thus identifying the minimisers in the non-asymptotic regime. Empirical results demonstrate improved downstream performance and robustness across combinations of different batch sizes and hyperparameters and reduced dimensionality collapse, on several computer vision datasets.

What problem does this paper attempt to address?

This paper attempts to solve the problems faced by different loss functions in the optimization process of contrastive learning (CL). Specifically, the paper mainly focuses on the following aspects: 1. **Optimization Objectives of Different Contrastive Learning Loss Functions**: - Although multiple contrastive learning methods have demonstrated excellent representation learning capabilities, the differences in their internal mechanisms remain opaque. By analyzing several contrastive learning methods, the paper proves that under certain conditions, these methods have the same minimum solutions when optimizing batch - level objectives or their expected values. These minimum solutions are closely related to the Hyperspherical Energy Minimisation (HEM) problem. 2. **Introduction of a New Contrastive Learning Objective**: - The paper proposes a new contrastive learning objective, called Decoupled Hyperspherical Energy Loss (DHEL). DHEL simplifies the problem by decoupling the target hyperspherical energy from positive - sample alignment and retains the same theoretical guarantees. 3. **Analysis of Kernel Contrastive Learning (KCL)**: - The paper further analyzes another related class of contrastive learning methods, namely kernel contrastive learning. The results show that KCL can also identify the minimum solution in the non - asymptotic case, and its expected loss is independent of the batch size, thus solving the limitations in the asymptotic analysis. 4. **Experimental Verification**: - Through experiments on multiple computer vision datasets, the paper verifies the downstream performance and robustness of DHEL and KCL under different batch sizes and hyperparameter combinations, and reduces the problem of representation dimension collapse. ### Specific Problems and Solutions 1. **Consistency of Batch Size and Asymptotic Behavior**: - The paper explores the optimal solutions in both single - batch and asymptotic - expectation cases. In the finite - batch case, when the batch size does not exceed the ambient dimension plus 1, multiple InfoNCE variants share the same unique optimal solution. In the asymptotic case, these variants also exhibit the same behavior, namely perfect alignment and uniform distribution. 2. **Impact of Decoupling Positive and Negative Samples**: - DHEL simplifies the optimization process by decoupling the impact of positive and negative samples. Specifically, DHEL replaces the classical InfoNCE denominator with a denominator containing only negative samples, thereby eliminating the dependence on positive samples. This enables the alignment term and the uniformity term to be optimized independently, improving the optimization efficiency. 3. **Advantages of Kernel Contrastive Learning**: - KCL can also identify the minimum solution in the non - asymptotic case, and its expected loss is independent of the batch size. This means that KCL is more flexible in practical applications and is not limited by the batch size. ### Experimental Results - **Performance and Robustness**: - The experimental results show that DHEL and KCL exhibit superior performance and robustness under different batch sizes and hyperparameter combinations. - **Reduction of Dimension Collapse**: - DHEL and KCL effectively utilize more dimensions and reduce the problem of representation dimension collapse. In summary, through theoretical analysis and experimental verification, this paper proposes a new contrastive learning objective DHEL and further explores the advantages of kernel contrastive learning KCL, providing new ideas for the optimization of contrastive learning methods.

Bridging Mini-Batch and Asymptotic Analysis in Contrastive Learning: From InfoNCE to Kernel-Based Losses

Debiased Graph Contrastive Learning.

Towards Understanding the Mechanism of Contrastive Learning via Similarity Structure: A Theoretical Analysis

Integrating Prior Knowledge in Contrastive Learning with Kernel

Contrastive Learning Is Spectral Clustering On Similarity Graph

Model-Aware Contrastive Learning: Towards Escaping the Dilemmas

Understanding and Generalizing Contrastive Learning from the Inverse Optimal Transport Perspective.

Simpler, Faster, Stronger: Breaking The log-K Curse On Contrastive Learners With FlatNCE

On the Importance of Contrastive Loss in Multimodal Learning

Adversarial Contrastive Learning via Asymmetric InfoNCE.

Understanding Contrastive Learning via Distributionally Robust Optimization

Adaptive Multi-head Contrastive Learning

Imbalance Mitigation for Continual Learning via Knowledge Decoupling and Dual Enhanced Contrastive Learning

Supervised Contrastive Learning with Hard Negative Samples

A Unified Framework for Contrastive Learning from a Perspective of Affinity Matrix

An Asymmetric Contrastive Loss for Handling Imbalanced Datasets

When can we Approximate Wide Contrastive Models with Neural Tangent Kernels and Principal Component Analysis?

Decoupled Kullback-Leibler Divergence Loss

Hyperbolic Contrastive Learning for Visual Representations beyond Objects