Class attention network for image recognition
Gong Cheng,Pujian Lai,Decheng Gao,Junwei Han
DOI: https://doi.org/10.1007/s11432-021-3493-7
2023-01-25
Science China Information Sciences
Abstract:Visual attention has become a popular and widely used component for image recognition. Although various attention-based methods have been proposed and achieved relatively competitive results, it is observed that the semantic features of each class are likely to entangle with each other, and few studies focus on explicitly extracting category-aware features so far. To address this issue, this paper presents an attention-based image recognition method by using class-specific dictionary learning to disentangle the neural network's outputs into class-dependent features, thus boosting their discrimination abilities. Specifically, we develop a class attention network (CANet) via integrating a simple yet effective class-specific attention encoding (CAE) module on the top of convolutional layers. Given the feature maps of the convolutional neural networks (CNNs), the CAE module learns a class-specific dictionary, which is leveraged to encode attention maps for each category. Then these attention maps are multiplied by the input features for class-wise adaptive feature refinement. Extensive experiments on the PASCAL VOC 2007, PASCAL VOC 2012, MS COCO, and CUB-200-2011 datasets demonstrate the fabulous performance of our method on multiple visual recognition tasks, including multi-label image classification and fine-grained visual classification. In addition, the visualization results testify that CNNs can explicitly learn class-wise feature representations by introducing class-specific dictionary learning.
computer science, information systems,engineering, electrical & electronic