Abstract:Previous theoretical work on contrastive learning (CL) with InfoNCE showed that, under certain assumptions, the learned representations uncover the ground-truth latent factors. We argue these theories overlook crucial aspects of how CL is deployed in practice. Specifically, they assume that within a positive pair, all latent factors either vary to a similar extent, or that some do not vary at all. However, in practice, positive pairs are often generated using augmentations such as strong cropping to just a few pixels. Hence, a more realistic assumption is that all latent factors change, with a continuum of variability across these factors. We introduce AnInfoNCE, a generalization of InfoNCE that can provably uncover the latent factors in this anisotropic setting, broadly generalizing previous identifiability results in CL. We validate our identifiability results in controlled experiments and show that AnInfoNCE increases the recovery of previously collapsed information in CIFAR10 and ImageNet, albeit at the cost of downstream accuracy. Additionally, we explore and discuss further mismatches between theoretical assumptions and practical implementations, including extensions to hard negative mining and loss ensembles.

What problem does this paper attempt to address?

The paper attempts to address the gap between theory and practice in Contrastive Learning (CL). Specifically, existing theoretical work assumes that in positive sample pairs, all latent factors either change to a similar extent or do not change at all. However, in practical applications, positive sample pairs are usually generated through augmentation techniques, such as aggressive cropping to only a few pixels, which causes all latent factors to change but to different extents. This heterogeneity in practical applications has not been adequately considered by existing theories. To address this issue, the authors introduce a new contrastive loss function—AnInfoNCE, which can identify latent factors under such heterogeneous settings, thereby broadly generalizing previous identifiability results. Additionally, the authors explore other mismatches between theoretical assumptions and practical implementations, including hard negative mining and the extension of the loss set. ### Main Contributions: 1. **Introduction of AnInfoNCE**: A generalized, identifiable contrastive loss function assuming the distribution of positive sample pairs is heterogeneous. 2. **Proposed Hard Negative Mining Model**: This model is theoretically identifiable and extends the main identifiability results to the loss set. 3. **Experimental Validation**: The effectiveness of the new loss function is validated on synthetic data and image experiments, demonstrating the ability to recover latent factors on CIFAR10 and ImageNet, although downstream classification accuracy decreases. 4. **Discussion of Remaining Gaps Between Theory and Practice**: Analyzes the impact of using augmentation techniques on real data and explores strategies to further bridge the gap between theory and practice. ### Experimental Results: - **Synthetic Experiments**: On synthetic data, AnInfoNCE shows high linear identifiability (R² scores) for both content and style latent factors across a wide range of concentration parameters, whereas the standard InfoNCE loss fails to identify style latent factors. - **MNIST Experiments**: On the MNIST dataset, AnInfoNCE perfectly identifies all latent factors, while the standard InfoNCE loss performs poorly in identifying style latent factors. - **Real-World Experiments**: On CIFAR10 and ImageNet, AnInfoNCE performs better in terms of augmentation readout accuracy, successfully recovering more latent dimensions, but downstream classification accuracy does not improve and even decreases. ### Analysis: While AnInfoNCE performs excellently in some controlled scenarios, a trade-off between augmentation readout accuracy and linear classification readout accuracy is observed on real-world datasets like CIFAR10 and ImageNet. Although higher augmentation readout accuracy indicates better capture of style latent factors, it does not translate to higher classification accuracy. This phenomenon may be related to the augmentation techniques used in real data, requiring further research to bridge the gap between theory and practice.

InfoNCE: Identifying the Gap Between Theory and Practice

Adversarial Contrastive Learning via Asymmetric InfoNCE.

Simpler, Faster, Stronger: Breaking The log-K Curse On Contrastive Learners With FlatNCE

Rethinking InfoNCE: How Many Negative Samples Do You Need?

SINCERE: Supervised Information Noise-Contrastive Estimation REvisited

InfoGCL: Information-Aware Graph Contrastive Learning

InfoNCE is variational inference in a recognition parameterised model

Revisiting Recommendation Loss Functions through Contrastive Learning (Technical Report)

A New Mechanism for Eliminating Implicit Conflict in Graph Contrastive Learning

Ranking Info Noise Contrastive Estimation: Boosting Contrastive Learning via Ranked Positives

Full-Attention Driven Graph Contrastive Learning: with Effective Mutual Information Insight

Understanding and Generalizing Contrastive Learning from the Inverse Optimal Transport Perspective.

Empowering Collaborative Filtering with Principled Adversarial Contrastive Loss

Model-Aware Contrastive Learning: Towards Escaping the Dilemmas

Identifiability Results for Multimodal Contrastive Learning

Towards noise contrastive estimation with soft targets for conditional models

Towards a Unified Framework of Contrastive Learning for Disentangled Representations

Contrastive Learning Via Equivariant Representation

Contrastive Learning of Preferences with a Contextual InfoNCE Loss

Towards a rigorous analysis of mutual information in contrastive learning

Towards generalizable Graph Contrastive Learning: An information theory perspective