Multilevel Attention for Referring Expression Comprehension

Yanfeng Sun,Yunru Zhang,Huajie Jiang,Yongli Hu,Baocai Yin
DOI: https://doi.org/10.1016/j.patrec.2023.07.005
IF: 4.757
2023-01-01
Pattern Recognition Letters
Abstract:Referring expression comprehension aims to locate a target object in an image described by a referring expression, where extracting semantic and discriminative visual information plays an important role. Most existing methods either ignore attribute information or context information in the model learning procedure, thus resulting in less effective visual features. In this paper, we propose a Multi-level Attention Network (MANet) to extract the target attribute information and the surrounding context information simultaneously for the target object, where the Attribute Attention Module is designed to extract the fine-grained visual information related to the referring expression and the Context Attention Module is designed to merge the context information of surroundings to learn more discriminative visual features. Experiments on various common benchmark datasets show the effectiveness of our approach.& COPY; 2023 Elsevier B.V. All rights reserved.
What problem does this paper attempt to address?