Learning From Noisy Correspondence With Tri-Partition for Cross-Modal Matching
Zerun Feng,Zhimin Zeng,Caili Guo,Zheng Li,Lin Hu
DOI: https://doi.org/10.1109/tmm.2023.3318002
IF: 7.3
2023-01-01
IEEE Transactions on Multimedia
Abstract:Due to high labeling cost, it is inevitable to introduce a certain proportion of noisy correspondence into visual-text datasets, resulting in poor model robustness for cross-modal matching. Although recent methods divide the datasets into clean and noisy pair subsets to yield promising achievements, they still suffer from deep neural networks over-fitting on noisy correspondence. In particular, the similar positive pairs with partially relevant semantic correspondence are easily partitioned into noisy pair subset by mistake without carefully selection, which brings harmful impact on robust learning. Meanwhile, the similar negative pairs with partially relevant semantic correspondence lead to ambiguous distance relations in common space learning, which also damages the stability of performance. To solve the coarse-grained dataset division problem, we propose Correspondence Tri-Partition Rectifier (CTPR) to partition the training set into clean, hard, and noisy pair subsets based on the memorization effect of neural networks and prediction inconsistency. Then, we refine the correspondence labels for each subset to indicate the real semantic correspondence between visual-text pairs. The differences between rectified labels of anchors and hard negatives are recast as the adaptive margin in the improved triplet loss for robust training in a co-teaching manner. To verify the effectiveness and robustness of our method, we conduct experiments by implementing image-text and video-text matching as two showcases. Extensive experiments on Flickr30Â K, MS-COCO, MSR-VTT, and LSMDC datasets verify that our method successfully partitions the visual-text pairs according to their semantic correspondence and improves performance under noisy data training.
computer science, information systems,telecommunications, software engineering