A Multi-scale Semantic Attention Representation for Multi-Label Image Recognition with Graph Networks

Jun Liang,Feiteng Xu,Songsen Yu
DOI: https://doi.org/10.1016/j.neucom.2022.03.057
IF: 6
2022-01-01
Neurocomputing
Abstract:Multi-label image recognition is a basic and challenging task in computer vision and multimedia fields. Graph Convolutional Networks (GCNs) are often used to learn the multi-label semantic features and multi-label dependency. Although the label semantic features in GCNs can learn the global image visual representation well, they are rarely used on the local image regions. Therefore, we try to use GCNs to learn global and local features at the same time, and make a balance between them. In this paper, we give a multi-scale semantic attention model MS-SGA-GCN including three main modules (i.e., MS, SGA and GCN) for multi-label image recognition. The Multi-Scale module (MS) utilizes feature maps of different sizes to obtain global features and have strong generalization capabilities. Semantic Guide Attention module (SGA) applies the label embeddings learned by GCNs to guide the generation of the cross-modality class-specific attention maps, which can discover the locations of semantically related regions for each label. Experiments show that our model on two datasets MS-COCO and PASCAL VOC2007 separately achieves the classification accuracy by 83.4 % and 94.2 %, which has a competitive advantage over other mainstream models.
What problem does this paper attempt to address?