Abstract:Most existing image-text matching methods adopt triplet loss as the optimization objective, and choosing a proper negative sample for the triplet of <anchor, positive, negative> is important for effectively training the model, e.g., hard negatives make the model learn efficiently and effectively. However, we observe that existing methods mainly employ the most similar samples as hard negatives, which may not be true negatives. In other words, the samples with high similarity but not paired with the anchor may reserve positive semantic associations, and we call them false negatives. Repelling these false negatives in triplet loss would mislead the semantic representation learning and result in inferior retrieval performance. In this paper, we propose a novel False Negative Elimination (FNE) strategy to select negatives via sampling, which could alleviate the problem introduced by false negatives. Specifically, we first construct the distributions of positive and negative samples separately via their similarities with the anchor, based on the features extracted from image and text encoders. Then we calculate the false negative probability of a given sample based on its similarity with the anchor and the above distributions via the Bayes' rule, which is employed as the sampling weight during negative sampling process. Since there may not exist any false negative in a small batch size, we design a memory module with momentum to retain a large negative buffer and implement our negative sampling strategy spanning over the buffer. In addition, to make the model focus on hard negatives, we reassign the sampling weights for the simple negatives with a cut-down strategy. The extensive experiments are conducted on Flickr30K and MS-COCO, and the results demonstrate the superiority of our proposed false negative elimination strategy. The code is available at <a class="link-external link-https" href="https://github.com/LuminosityX/FNE" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

### The Problem the Paper Attempts to Solve The paper aims to address a critical issue in the image-text matching task: the impact of **False Negatives**. In existing image-text matching methods, triplet loss is typically used as the optimization objective, and selecting appropriate negative samples is crucial for effectively training the model. However, current methods mainly choose the most similar samples as hard negatives. These samples, although not paired with the anchor sample, may retain positive semantic associations, meaning they are actually false negatives. Specifically, the paper points out: 1. **Definition of False Negatives**: False negatives are samples that are labeled as negative in the dataset but are actually semantically matched with the anchor sample. 2. **Impact of False Negatives**: Excluding these false negatives in triplet loss misguides the learning of semantic representations, leading to a decline in retrieval performance. To mitigate this issue, the paper proposes a new **False Negative Elimination (FNE) strategy**, which reduces the occurrence frequency of false negatives through sampling methods, thereby improving the model's semantic representation capability and image-text matching performance.

Your Negative May not Be True Negative: Boosting Image-Text Matching with False Negative Elimination

Negative Sample is Negative in Its Own Way: Tailoring Negative Sentences for Image-Text Retrieval

The Dilemma of TriHard Loss and an Element-Weighted TriHard Loss for Person Re-Identification

A Multiple Positives Enhanced NCE Loss for Image-Text Retrieval

Negative-Aware Attention Framework for Image-Text Matching

Rethinking InfoNCE: How Many Negative Samples Do You Need?

Enhancing Recommender Systems: A Strategy to Mitigate False Negative Impact

Selectively Hard Negative Mining for Alleviating Gradient Vanishing in Image-Text Matching

Simplify and Robustify Negative Sampling for Implicit Collaborative Filtering

SimANS: Simple Ambiguous Negatives Sampling for Dense Text Retrieval

Augmented Negative Sampling for Collaborative Filtering

Better Sampling of Negatives for Distantly Supervised Named Entity Recognition

Integrating Language Guidance into Image-Text Matching for Correcting False Negatives

Generating Enhanced Negatives for Training Language-Based Object Detectors

UFNRec: Utilizing False Negative Samples for Sequential Recommendation

Negative Token Merging: Image-based Adversarial Feature Guidance

Adaptive Hardness Negative Sampling for Collaborative Filtering

Sample-Specific Debiasing for Better Image-Text Models

Entity Similarity-Based Negative Sampling for Knowledge Graph Embedding

Negative Samples Mining Matters: Reconsidering Hyperspectral Image Classification with Contrastive Learning

ITContrast: contrastive learning with hard negative synthesis for image-text matching