Local Alignment with Global Semantic Consistence Network for Image–Text Matching

Pengwei Li,Shihua Wu,Zhichao Lian
DOI: https://doi.org/10.1109/dasc/picom/cbdcom/cy55231.2022.9927900
2022-01-01
Abstract:Image-text matching is a major task in cross-modal information processing, which refers to measuring the similarity between an image and a sentence. The existing methods are mainly divided into global embedding and local alignment. Recently, local alignment methods that uses the fine-grained features to explore the correspondence between image regions and text words have achieved impressive results. However, the local alignment methods only focus on the matching between significant objects and ignores the importance of global semantics. To solve this problem, we propose a novel Local Alignment with Global Semantic Consistence Network (LAGSC).which performs cross-modal matching at global and local levels. Our method provides supervisory information for image-text pairs through label vectors of image regions, so as to maintain global semantic consistency. Experiment results on two benchmark datasets Flickr30K and MS-COCO prove the effectiveness of our method.
What problem does this paper attempt to address?