Contrastive Learning with Negative Sampling Correction

Lu Wang,Chao Du,Pu Zhao,Chuan Luo,Zhangchi Zhu,Bo Qiao,Wei Zhang,Qingwei Lin,Saravan Rajmohan,Dongmei Zhang,Qi Zhang
2024-01-13
Abstract:As one of the most effective self-supervised representation learning methods, contrastive learning (CL) relies on multiple negative pairs to contrast against each positive pair. In the standard practice of contrastive learning, data augmentation methods are utilized to generate both positive and negative pairs. While existing works have been focusing on improving the positive sampling, the negative sampling process is often overlooked. In fact, the generated negative samples are often polluted by positive samples, which leads to a biased loss and performance degradation. To correct the negative sampling bias, we propose a novel contrastive learning method named Positive-Unlabeled Contrastive Learning (PUCL). PUCL treats the generated negative samples as unlabeled samples and uses information from positive samples to correct bias in contrastive loss. We prove that the corrected loss used in PUCL only incurs a negligible bias compared to the unbiased contrastive loss. PUCL can be applied to general contrastive learning problems and outperforms state-of-the-art methods on various image and graph classification tasks. The code of PUCL is in the supplementary file.
Machine Learning
What problem does this paper attempt to address?
This paper focuses on the problem of negative sample sampling in contrastive learning. In standard practice of contrastive learning, data augmentation methods are used to generate positive and negative sample pairs. However, the negative samples are often contaminated by the positive samples, resulting in biased loss functions and reduced performance. To correct the bias in negative sample sampling, the paper proposes a new contrastive learning method called Positive-Unlabeled Contrastive Learning (PUCL). PUCL treats the generated negative samples as unlabeled samples and utilizes the information from positive samples to correct the bias in the contrastive loss. The paper demonstrates that the correction loss used in PUCL can be negligible compared to the unbiased contrastive loss. PUCL is applicable to various image and graph classification tasks and outperforms the current state-of-the-art methods on these tasks. In summary, the paper attempts to address the problem of how to improve negative sample sampling in contrastive learning in order to reduce the bias in loss functions and performance degradation caused by negative sample contamination, thereby improving the quality of learned representations.