Abstract:Unsupervised contrastive learning has achieved outstanding success, while the mechanism of contrastive loss has been less studied. In this paper, we concentrate on the understanding of the behaviours of unsupervised contrastive loss. We will show that the contrastive loss is a hardness-aware loss function, and the temperature {\tau} controls the strength of penalties on hard negative samples. The previous study has shown that uniformity is a key property of contrastive learning. We build relations between the uniformity and the temperature {\tau} . We will show that uniformity helps the contrastive learning to learn separable features, however excessive pursuit to the uniformity makes the contrastive loss not tolerant to semantically similar samples, which may break the underlying semantic structure and be harmful to the formation of features useful for downstream tasks. This is caused by the inherent defect of the instance discrimination objective. Specifically, instance discrimination objective tries to push all different instances apart, ignoring the underlying relations between samples. Pushing semantically consistent samples apart has no positive effect for acquiring a prior informative to general downstream tasks. A well-designed contrastive loss should have some extents of tolerance to the closeness of semantically similar samples. Therefore, we find that the contrastive loss meets a uniformity-tolerance dilemma, and a good choice of temperature can compromise these two properties properly to both learn separable features and tolerant to semantically similar samples, improving the feature qualities and the downstream performances.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to understand the behavioral characteristics of contrastive loss in unsupervised contrastive learning. Specifically, the author focuses on the following points:
1. **Hardness - aware characteristics of contrastive loss**: The paper analyzes that contrastive loss is a hardness - aware loss function, which can automatically focus on optimizing hard negative samples and impose penalties according to their hardness. The temperature parameter \( \tau \) controls the intensity of the penalty for hard negative samples. A small temperature value tends to impose a greater penalty on the hardest negative samples, making the local structure more separated and the embedding distribution more uniform; while a large temperature value is less sensitive to hard negative samples, and the hardness - aware characteristics disappear.
2. **The contradiction between uniformity and tolerance**: The paper explores the contradiction between the uniformity of the embedding distribution and the tolerance for semantically similar samples. Uniformity helps to learn separable features, but excessive pursuit of uniformity will destroy the underlying semantic structure, which is harmful to downstream tasks. This is due to the inherent defect of the instance discrimination objective, that is, trying to push all different instances apart and ignoring the potential relationships between samples.
3. **Selection of temperature**: The paper proposes that by selecting an appropriate temperature, a certain tolerance for semantically similar samples can be maintained while maintaining the separability of features, thereby improving the feature quality and the performance of downstream tasks.
### Main contributions of the paper
1. **Analysis of the behavior of contrastive loss**: The paper shows that contrastive loss is a hardness - aware loss function and verifies the importance of the hardness - aware characteristics for the success of contrastive loss.
2. **The role of temperature**: Through gradient analysis, the paper shows that temperature is a key parameter for controlling the intensity of the penalty for hard negative samples, and this is verified by quantitative and qualitative experiments.
3. **The contradiction between uniformity and tolerance**: The paper reveals the contradiction between uniformity and tolerance in unsupervised contrastive learning, and points out that selecting an appropriate temperature can balance these two characteristics and significantly improve the feature quality.
### Related work
The paper reviews the progress of unsupervised learning methods, especially those based on contrastive learning. These methods learn representations by maximizing the similarity between different views and minimizing the similarity between different instances. In addition, some works attempt to understand the mechanism of contrastive learning, for example, by introducing theoretical frameworks such as latent classes and mutual information.
### Experimental details
The paper conducts experiments on multiple datasets, including CIFAR10, CIFAR100, SVHN and ImageNet100. The experiments use ResNet - 18 and ResNet - 50 as the backbone networks, and evaluate the performance of the pre - trained models through linear classification tasks. The experimental results show that selecting an appropriate temperature can effectively balance the uniformity of the embedding distribution and the tolerance for semantically similar samples, thereby improving the performance of downstream tasks.