Attention-Augmented Memory Network for Image Multi-Label Classification

Wei Zhou,Yanke Hou,Dihu Chen,Haifeng Hu,Tao Su
DOI: https://doi.org/10.1145/3570166
IF: 4.094
2023-01-01
ACM Transactions on Multimedia Computing Communications and Applications
Abstract:The purpose of image multi-label classification is to predict all the object categories presented in an image. Some recent works exploit graph convolution network to capture the correlation between labels. Although promising results have been reported, these methods cannot learn salient object features in the images and ignore the correlation between channel feature maps. In addition, the current researches only learn the feature information within individual input image, but fail to mine the contextual information of various categories from the dataset to enhance the input feature representation. To address these issues, we propose an A ttention- A ugmented M emory N etwork ( AAMN ) model for the image multi-label classification task. Specifically, we first propose a novel categorical memory module to excavate the contextual information of various categories from the dataset to augment the current input feature. Secondly, we design a new channel-relation exploration module to capture the inter-channel relationship of features, so as to enhance the correlation between objects in the images. Thirdly, we develop a spatial-relation enhancement module to model second-order statistics of features and capture long-range dependencies between pixels in feature maps, so as to learn salient object features. Experimental results on standard benchmarks, including MS-COCO 2014, PASCAL VOC 2007, and VG-500, demonstrate the effectiveness and superiority of AAMN model, which outperforms current state-of-the-art methods.
What problem does this paper attempt to address?