Globally Guided Confidence Enhancement Network for Image-Text Matching

Xin Dai,Gulanbaier Tuerhong,Mairidan Wushouer
DOI: https://doi.org/10.3390/app13095658
2023-01-01
Applied Sciences
Abstract:Image-text matching is a crucial aspect of multi-modal intelligence. The main challenge in this area is accurately measuring the relevance between the image and text, using evidence obtained through matching. Previous studies either concentrated on obtaining a well-represented global feature to measure similarity directly or on investigating complex matching patterns at a local level before aggregating them, with little attention paid to combining them. We propose a Globally Guided Confidence Enhancement Network that combines both approaches by obtaining a good global representation to guide fine-grained local interactions. In this process, content that better matches the text from a global perspective is enhanced and represented with confidence scores. Extensive experiments demonstrate that the approach we have employed achieves superior performance on Flickr30K and MSCOCO datasets.
What problem does this paper attempt to address?