A Multiple Positives Enhanced NCE Loss for Image-Text Retrieval

Yi Li,Dehao Wu,Yuesheng Zhu
DOI: https://doi.org/10.1007/978-3-030-98358-1_34
2022-01-01
Abstract:Image-Text Retrieval (ITR) enables users to retrieve relevant contents from different modalities and has attracted considerable attention. Existing approaches typically utilize contrastive loss functions to conduct contrastive learning in the common embedding space, where they aim at pulling semantically related pairs closer while pushing away unrelated pairs. However, we argue that this behaviour is too strict: these approaches neglect to address the inherent misalignments from potential semantically related samples. For example, it commonly exists more than one positive samples in the current batch for a given query and previous methods enforce them apart even if they are semantically related, which leads to a sub-optimal and contradictory optimization direction and then decreases the retrieval performance. In this paper, a Multiple Positives Enhanced Noise Contrastive Estimation learning objective is proposed to alleviate the diversion noise by leveraging and optimizing multiple positive pairs overall for each sample in a mini-batch. We demonstrate the effectiveness of our approach on MS-COCO and Flickr3OK datasets for image-to-text and text-to-image retrieval.
What problem does this paper attempt to address?