ScoreCL: Augmentation-Adaptive Contrastive Learning via Score-Matching Function

Jin-Young Kim,Soonwoo Kwon,Hyojun Go,Yunsung Lee,Seungtaek Choi,Hyun-Gyoon Kim
2024-07-15
Abstract:Self-supervised contrastive learning (CL) has achieved state-of-the-art performance in representation learning by minimizing the distance between positive pairs while maximizing that of negative ones. Recently, it has been verified that the model learns better representation with diversely augmented positive pairs because they enable the model to be more view-invariant. However, only a few studies on CL have considered the difference between augmented views, and have not gone beyond the hand-crafted findings. In this paper, we first observe that the score-matching function can measure how much data has changed from the original through augmentation. With the observed property, every pair in CL can be weighted adaptively by the difference of score values, resulting in boosting the performance of the existing CL method. We show the generality of our method, referred to as ScoreCL, by consistently improving various CL methods, SimCLR, SimSiam, W-MSE, and VICReg, up to 3%p in k-NN evaluation on CIFAR-10, CIFAR-100, and ImageNet-100. Moreover, we have conducted exhaustive experiments and ablations, including results on diverse downstream tasks, comparison with possible baselines, and improvement when used with other proposed augmentation methods. We hope our exploration will inspire more research in exploiting the score matching for CL.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper aims to address an issue in Contrastive Learning (CL): current research mostly overlooks the differences between augmented views. Specifically, although some studies have shown that diversifying augmented views can improve model performance, these methods often rely on manually designed strategies and do not fundamentally solve the problem of how to dynamically adjust the learning process based on the differences between views. The authors observed that the Score-Matching Function can measure the degree of change in data after augmentation and proposed a new framework called ScoreCL based on this observation. This framework automatically adapts to the differences between different augmented views through score values, thereby giving higher weight to views with more information during the training process. Experimental results show that on multiple benchmark datasets, such as CIFAR-10, CIFAR-100, and ImageNet, ScoreCL significantly improves the performance of several existing contrastive learning methods (including SimCLR, SimSiam, W-MSE, and VICReg), with performance gains of up to 3%. In short, the main contributions of this paper are: 1. Proposing a score-based adaptive contrastive loss function that can be flexibly applied under different augmentation strategies. 2. Demonstrating the effectiveness of the proposed method through extensive experiments, especially when combined with recent contrastive learning methods and large-scale datasets. 3. Analyzing for the first time the characteristics of the score-matching function in measuring augmentation scales.