Abstract:Contrastive sentence representation learning has made great progress thanks to a range of text augmentation strategies and hard negative sampling techniques. However, most studies directly employ in-batch samples as negative samples, ignoring the semantic relationship between negative samples and anchors, which may lead to negative sampling bias. To address this issue, we propose similarity and relative-similarity strategies for identifying potential false negatives. Moreover, we introduce adaptive false negative elimination and attraction methods to mitigate their adverse effects. Our proposed approaches can also be considered semi-supervised contrastive learning, as the identified false negatives can be viewed as either negative or positive samples for contrastive learning in adaptive false negative elimination and attraction methods. By fusing information from positive and negative pairs, contrastive learning learns rich and discriminative representations that capture the intrinsic characteristics of the sentence. Experimental results indicate that our proposed strategies and methods can bring further significant performance improvements. Specifically, the combination of similarity strategy and adaptive false negative elimination method achieves the best results, yielding an average performance gain of 2.1% compared to SimCSE in semantic textual similarity (STS) tasks. Furthermore, our approach is generalizable and can be applied to different text data augmentation strategies and certain existing contrastive sentence representation learning models. Our code and data are publicly available at the link: https://github.com/Linda230/AFNC .

SNCSE: Contrastive Learning for Unsupervised Sentence Embedding with Soft Negative Samples

OssCSE: Overcoming Surface Structure Bias in Contrastive Learning for Unsupervised Sentence Embedding

HNCSE: Advancing Sentence Embeddings Via Hybrid Contrastive Learning with Hard Negatives

NCSE: Neighbor Contrastive Learning for Unsupervised Sentence Embeddings

Exploring the Impact of Negative Samples of Contrastive Learning: A Case Study of Sentence Embedding

A Contrastive Framework to Enhance Unsupervised Sentence Representation Learning

DiffCSE: Difference-based Contrastive Learning for Sentence Embeddings

Contrastive sentence representation learning with adaptive false negative cancellation

Unsupervised Sentence Embedding Model Based on Contrastive Learning

Improving Contrastive Learning of Sentence Embeddings with Focal-InfoNCE

Unsupervised Sentence Representation Via Contrastive Learning with Mixing Negatives

Text Semantic Matching with an Enhanced Sample Building Method Based on Contrastive Learning

DebCSE: Rethinking Unsupervised Contrastive Sentence Embedding Learning in the Debiasing Perspective

SimCSE++: Improving Contrastive Learning for Sentence Embeddings from Two Perspectives

KDMCSE: Knowledge Distillation Multimodal Sentence Embeddings with Adaptive Angular margin Contrastive Learning

Importance-aware contrastive learning via semantically augmented instances for unsupervised sentence embeddings

Instance Smoothed Contrastive Learning for Unsupervised Sentence Embedding

InfoCSE: Information-aggregated Contrastive Learning of Sentence Embeddings

CLSESSP: Contrastive learning of sentence embedding with strong semantic prototypes

UNSEE: Unsupervised Non-contrastive Sentence Embeddings

SCE: Scalable Network Embedding from Sparsest Cut