Fusion Layer Attention for Image-Text Matching.

Depeng Wang,Liejun Wang,Shiji Song,Gao Huang,Yuchen Guo,Shuli Cheng,Naixiang Ao,Anyu Du
DOI: https://doi.org/10.1016/j.neucom.2021.01.124
IF: 6
2021-01-01
Neurocomputing
Abstract:Image-text matching aims to find the relationship between image and text data and to establish a connection between them. The main challenge of image-text matching is the fact that images and texts have different data distributions and feature representations. Current methods for image-text matching fall into two basic types: methods that map image and text data into a common space and then use distance measurements and methods that treat image-text matching as a classification problem. In both cases, the two data modes used are image and text data. In our method, we create a fusion layer to extract intermediate modes, thus improving the image-text processing results. We also propose a concise way to update the loss function that makes it easier for neural networks to handle difficult problems. The proposed method was verified on the Flickr30K and MS-COCO datasets and achieved superior matching results compared to existing methods.
What problem does this paper attempt to address?