Abstract:Due to high labeling cost, it is inevitable to introduce a certain proportion of noisy correspondence into visual-text datasets, resulting in poor model robustness for cross-modal matching. Although recent methods divide the datasets into clean and noisy pair subsets to yield promising achievements, they still suffer from deep neural networks over-fitting on noisy correspondence. In particular, the similar positive pairs with partially relevant semantic correspondence are easily partitioned into noisy pair subset by mistake without carefully selection, which brings harmful impact on robust learning. Meanwhile, the similar negative pairs with partially relevant semantic correspondence lead to ambiguous distance relations in common space learning, which also damages the stability of performance. To solve the coarse-grained dataset division problem, we propose Correspondence Tri-Partition Rectifier (CTPR) to partition the training set into clean, hard, and noisy pair subsets based on the memorization effect of neural networks and prediction inconsistency. Then, we refine the correspondence labels for each subset to indicate the real semantic correspondence between visual-text pairs. The differences between rectified labels of anchors and hard negatives are recast as the adaptive margin in the improved triplet loss for robust training in a co-teaching manner. To verify the effectiveness and robustness of our method, we conduct experiments by implementing image-text and video-text matching as two showcases. Extensive experiments on Flickr30 K, MS-COCO, MSR-VTT, and LSMDC datasets verify that our method successfully partitions the visual-text pairs according to their semantic correspondence and improves performance under noisy data training.

Learning with Noisy Correspondence

Noisy Correspondence Learning with Self-Reinforcing Errors Mitigation

NAC: Mitigating Noisy Correspondence in Cross-Modal Matching Via Neighbor Auxiliary Corrector.

Noisy Correspondence Learning with Meta Similarity Correction

Cross-Modal Retrieval With Noisy Correspondence via Consistency Refining and Mining

Adaptive Integration of Partial Label Learning and Negative Learning for Enhanced Noisy Label Learning

Mitigating Noisy Correspondence by Geometrical Structure Consistency Learning

Disentangled Noisy Correspondence Learning

Learning From Noisy Correspondence With Tri-Partition for Cross-Modal Matching

Learning With Noisy Labels Over Imbalanced Subpopulations

Robust Object Re-identification with Coupled Noisy Labels

Channel-Wise Contrastive Learning for Learning with Noisy Labels

Countering Noisy Labels by Learning from Auxiliary Clean Labels

Learning with Noisy Labels Via Self-supervised Adversarial Noisy Masking

Noisy Pair Corrector for Dense Retrieval

Fine-Grained Classification with Noisy Labels

PC$^2$: Pseudo-Classification Based Pseudo-Captioning for Noisy Correspondence Learning in Cross-Modal Retrieval

Symmetric Cross Entropy for Robust Learning with Noisy Labels

Graph Matching with Bi-level Noisy Correspondence