Abstract:The task of image multi-label classification is to accurately recognize multiple objects in an input image. Most of the recent works need to leverage the label co-occurrence matrix counted from training data to construct the graph structure, which are inflexible and may degrade model generalizability. In addition, these methods fail to capture the semantic correlation between the channel feature maps to further improve model performance. To address these issues, we propose DA-GAT (a D ouble A ttention framework based on the G raph A ttention ne T work) to effectively learn the correlation between labels from training data. First, we devise a new channel attention mechanism to enhance the semantic correlation between channel feature maps, so as to implicitly capture the correlation between labels. Second, we propose a new label attention mechanism to avoid the adverse impact of a manually constructed label co-occurrence matrix. It only needs to leverage the label embedding as the input of network, then automatically constructs the label relation matrix to explicitly establish the correlation between labels. Finally, we effectively fuse the output of these two attention mechanisms to further improve model performance. Extensive experiments are conducted on three public multi-label classification benchmarks. Our DA-GAT model achieves mean average precision of 87.1%, 96.6%, and 64.3% on MS-COCO 2014, PASCAL VOC 2007, and NUS-WIDE, respectively, and obviously outperforms other existing state-of-the-art methods. In addition, visual analysis experiments demonstrate that each attention mechanism can capture the correlation between labels well and significantly promote the model performance.

Using Multi-Label Classification to Improve Object Detection.

A MultiPath Network for Object Detection

Multi-branch Bounding Box Regression for Object Detection

Spatial Context-Aware Object-Attentional Network for Multi-Label Image Classification

Chained-Tracker: Chaining Paired Attentive Regression Results for End-to-End Joint Multiple-Object Detection and Tracking

Multi-scale and Discriminative Part Detectors Based Features for Multi-label Image Classification.

Exploit Bounding Box Annotations for Multi-Label Object Recognition

Multi-Task Learning Via SA-FPN and EJ-Head

Double Attention Based on Graph Attention Network for Image Multi-Label Classification

Multi-Semantic Interactive Learning for Object Detection

Learning to Discover Multi-Class Attentional Regions for Multi-Label Image Recognition

MRMNet: Multi-scale residual multi-branch neural network for object detection

Multi-Attention Multi-Class Constraint for Fine-grained Image Recognition

Two Cases of Sinusitis Induced by Immune Checkpoint Inhibition.

A Multi-Feature Fusion and Attention Network for Multi-Scale Object Detection in Remote Sensing Images

Anno-incomplete Multi-dataset Detection

Multi-Scale Feature Selective Matching Network for Object Detection

Dilated-Scale-Aware Attention ConvNet For Multi-Class Object Counting

MPF-Net: multi-projection filtering network for few-shot object detection

MM-FPN: Multi-path and Multi-scale Feature Pyramid Network for Object Detection

Multi-adversarial Faster-RCNN with Paradigm Teacher for Unrestricted Object Detection