NOAH: Learning Pairwise Object Category Attentions for Image Classification

Chao Li,Aojun Zhou,Anbang Yao

2024-02-04

Abstract:A modern deep neural network (DNN) for image classification tasks typically consists of two parts: a backbone for feature extraction, and a head for feature encoding and class predication. We observe that the head structures of mainstream DNNs adopt a similar feature encoding pipeline, exploiting global feature dependencies while disregarding local ones. In this paper, we revisit the feature encoding problem, and propose Non-glObal Attentive Head (NOAH) that relies on a new form of dot-product attention called pairwise object category attention (POCA), efficiently exploiting spatially dense category-specific attentions to augment classification performance. NOAH introduces a neat combination of feature split, transform and merge operations to learn POCAs at local to global scales. As a drop-in design, NOAH can be easily used to replace existing heads of various types of DNNs, improving classification performance while maintaining similar model efficiency. We validate the effectiveness of NOAH on ImageNet classification benchmark with 25 DNN architectures spanning convolutional neural networks, vision transformers and multi-layer perceptrons. In general, NOAH is able to significantly improve the performance of lightweight DNNs, e.g., showing 3.14\%|5.3\%|1.9\% top-1 accuracy improvement to MobileNetV2 (0.5x)|Deit-Tiny (0.5x)|gMLP-Tiny (0.5x). NOAH also generalizes well when applied to medium-size and large-size DNNs. We further show that NOAH retains its efficacy on other popular multi-class and multi-label image classification benchmarks as well as in different training regimes, e.g., showing 3.6\%|1.1\% mAP improvement to large ResNet101|ViT-Large on MS-COCO dataset. Project page:

Computer Vision and Pattern Recognition,Machine Learning

What problem does this paper attempt to address?

The paper focuses on the design of the head structure of deep neural networks (DNN) in image classification tasks. Modern DNNs typically consist of a backbone structure for feature extraction and a head structure for feature encoding and category prediction. The paper argues that the head structure of mainstream DNNs overlooks local feature dependencies during feature encoding, primarily relying on global features. To address this issue, the paper proposes a Non-Global Attention Head (NOAH) that employs a novel attention mechanism called Pairwise Object Category Attention (POCA) to effectively utilize spatially dense category-specific attention from local to global scales to enhance classification performance. NOAH learns POCA through feature splitting, transformation, and merging operations, which allows it to be easily replaced in various types of DNNs as a plug-and-play design while improving classification performance and maintaining similar model efficiency. The effectiveness of NOAH is validated on the ImageNet classification benchmark using 25 different DNN architectures, including convolutional neural networks, vision transformers, and multilayer perceptrons. The results demonstrate that NOAH significantly enhances the performance of lightweight DNNs and performs well on DNNs of different scales. Furthermore, the paper compares NOAH with different head structures such as global average pooling and label-based designs, demonstrating the advantages of NOAH in terms of accuracy. NOAH is also applied to other multi-class and multi-label image classification benchmarks and different training settings, showing its generalizability and effectiveness.

NOAH: Learning Pairwise Object Category Attentions for Image Classification

Learning Attentive Pairwise Interaction for Fine-Grained Classification

HCFNN: High-order Coverage Function Neural Network for Image Classification

HAM: Hybrid Attention Module in Deep Convolutional Neural Networks for Image Classification

Spatial Context-Aware Object-Attentional Network for Multi-Label Image Classification

Class attention network for image recognition

Learning Paired-associate Images with An Unsupervised Deep Learning Architecture

Double Attention Based on Graph Attention Network for Image Multi-Label Classification

TDAPNet: Prototype Network with Recurrent Top-Down Attention for Robust Object Classification under Partial Occlusion

Few-Shot Object Detection Based on Adaptive Attention Mechanism and Large-Margin Softmax

An Attention Module for Convolutional Neural Networks

Context-aware Attentional Pooling (CAP) for Fine-grained Visual Classification

An Image Classification Method Based on Adaptive Attention Mechanism and Feature Extraction Network

Attention CoupleNet: Fully Convolutional Attention Coupling Network for Object Detection

Pairwise Comparison Network for Remote-Sensing Scene Classification

Object detection based on an adaptive attention mechanism

Multi-label Object Attribute Classification using a Convolutional Neural Network

Spatial-Context-Aware Deep Neural Network for Multi-Class Image Classification

Attention on Attention for Image Captioning

Improved Deep Learning of Object Category Using Pose Information

Learning From Human Attention for Attribute-Assisted Visual Recognition