Interclass-Relativity-Adaptive Metric Learning for Cross-Modal Matching and Beyond

Feiyu Chen,Jie Shao,Yonghui Zhang,Xing Xu,Heng Tao Shen
DOI: https://doi.org/10.1109/tmm.2020.3019710
IF: 7.3
2021-01-01
IEEE Transactions on Multimedia
Abstract:Training under supervision of triplet ranking loss is a dominant methodology for cross-modal matching models, while good-performing losses in this domain are immensely under-explored since the majority of advanced metric losses are inapplicable due to the particularity of cross-modal setting. Current prominent approaches of metric learning have developed various weighting schemes that assign weights to separate positive or negative samples. It is the interclass relative order in a triplet, however, that matters. In this work, we propose a new Interclass-Relativity-Adaptive (IRA) loss that assigns weights to the relative similarities between positive and negative pairs instead of separate pairs, which allows us to regard a whole triplet as a weighable entity and achieve maximum utilization of sole positive under cross-modal setting. Our method outperforms the baselines by a large margin and obtains competitive results on two video-text matching benchmarks and two image-text matching benchmarks. We also further extend our method to two unimodal image retrieval benchmarks to test its generality and achieve new state-of-the-art results.
computer science, information systems,telecommunications, software engineering
What problem does this paper attempt to address?