Context‐aware relation enhancement and similarity reasoning for image‐text retrieval

Zheng Cui,Yongli Hu,Yanfeng Sun,Baocai Yin
DOI: https://doi.org/10.1049/cvi2.12270
IF: 1.484
2024-01-31
IET Computer Vision
Abstract:A novel context‐aware relation enhancement and similarity reasoning model is proposed to achieve precise image‐text retrieval, which conducts both intra‐modal relation enhancement and inter‐modal similarity reasoning while considering the global‐context information. Image‐text retrieval is a fundamental yet challenging task, which aims to bridge a semantic gap between heterogeneous data to achieve precise measurements of semantic similarity. The technique of fine‐grained alignment between cross‐modal features plays a key role in various successful methods that have been proposed. Nevertheless, existing methods cannot effectively utilise intra‐modal information to enhance feature representation and lack powerful similarity reasoning to get a precise similarity score. Intending to tackle these issues, a context‐aware Relation Enhancement and Similarity Reasoning model, called RESR, is proposed, which conducts both intra‐modal relation enhancement and inter‐modal similarity reasoning while considering the global‐context information. For intra‐modal relation enhancement, a novel context‐aware graph convolutional network is introduced to enhance local feature representations by utilising relation and global‐context information. For inter‐modal similarity reasoning, local and global similarity features are exploited by the bidirectional alignment of image and text, and the similarity reasoning is implemented among multi‐granularity similarity features. Finally, refined local and global similarity features are adaptively fused to get a precise similarity score. The experimental results show that our effective model outperforms some state‐of‐the‐art approaches, achieving average improvements of 2.5% and 6.3% in R@sum on the Flickr30K and MS‐COCO dataset.
computer science, artificial intelligence,engineering, electrical & electronic
What problem does this paper attempt to address?