RFE-SRN: Image-text similarity reasoning network based on regional feature enhancement

Xiaoyu Yang,Chao Li,Dongzhong Zheng,Peng Wen,Guangqiang Yin
DOI: https://doi.org/10.1016/j.neucom.2022.11.003
IF: 6
2023-01-01
Neurocomputing
Abstract:Image-text matching aims to make a connection between visual and natural information. Some of the current methods have made great progress by using global alignment between images and sentences and local alignment between the image region and its corresponding word. However, the importance of the correlation between global alignment and local alignment is ignored to some extends. Therefore, in this paper, we propose a new region feature enhancement-based image text similarity inference network. Firstly, the image region feature is enhanced by graph convolutional neural network, which is used to find the correlation among different image region features with the production of features’ sematic relationship. Secondly, we propose a vector-based similarity representation to describe local and global alignment with a more comprehensive way. Finally, a graph convolution neural network is introduced to construct a similarity graph for propagating correlation between local alignment and global alignment to every part. By testing on the MSCOCO and Flickr30k dataset, our proposed method shows great accuracy performance and competitiveness.
What problem does this paper attempt to address?