Abstract:Multi-label image classification is a fundamental and vital task in computer vision. The latest methods are mostly based on deep learning and exhibit excellent performance in understanding images. However, in previous studies, only capture the image content information has been captured using convolutional neural networks (CNNs), and the semantic structure information and implicit dependencies between labels and image regions have been ignored. Therefore, it is necessary to develop more effective methods for integrating semantic information and visual features in multi-label image classification. In this study, we propose a novel framework for multi-label image classification, named FLNet, which simultaneously takes advantage of the visual features and semantic structure. Specifically, to enhance the association between semantic annotations and image regions, we first integrate the attention mechanism with a CNN to focus on the target regions while ignoring other useless surrounding information and then employ graph convolutional network (GCN) to capture the structure information between multiple labels. Based on our architecture, we also introduce the lateral connections to repeatedly inject the label system into the CNN backbone during the GCN learning process to improve performance and, consequently, learn interdependent classifiers for each image label. We apply our method to multi-label image classification. The experiments on two public multi-label benchmark datasets, namely, MS-COCO and PASCAL visual object classes challenge (VOC 2007), demonstrate that our approach outperforms other existing state-of-the-art methods. Our method learns specific target regions and enhances the association between labels and image regions by using semantic information and attention mechanism. Thus, we combine the advantages of both visual and semantic information to further improve the image classification performance. Finally, the correctness and effectiveness of the proposed method are proven by visualizing the classifier results.

A Multi-scale Semantic Attention Representation for Multi-Label Image Recognition with Graph Networks

Semantic-Interactive Graph Convolutional Network for Multilabel Image Recognition

An Attention-Driven Multi-label Image Classification with Semantic Embedding and Graph Convolutional Networks.

Learning Semantic-Specific Graph Representation for Multi-Label Image Recognition

Graph Attention Mechanism with Global Contextual Information for Multi-Label Image Recognition

Multi-Label Image Recognition With Graph Convolutional Networks

Multiple Semantic Embedding with Graph Convolutional Networks for Multi-Label Image Classification.

Multi-Label Image Classification with Attention Mechanism and Graph Convolutional Networks

Multi-label Image Recognition with Two-Stream Dynamic Graph Convolution Networks.

Semantic-Guided Representation Enhancement for Multi-Label Image Classification

Semantic-Aware Graph Matching Mechanism for Multi-Label Image Recognition

Learning Graph Convolutional Networks for Multi-Label Recognition and Applications

Attention-Driven Dynamic Graph Convolutional Network for Multi-Label Image Recognition

Learning to Discover Multi-Class Attentional Regions for Multi-Label Image Recognition

Multi-Scale Cross-Modal Spatial Attention Fusion for Multi-label Image Recognition.

Multi-Label Classification with Label Graph Superimposing

Multilabel Recognition Algorithm With Multigraph Structure

Cross-Modal Feature Representation Learning and Label Graph Mining in a Residual Multi-Attentional CNN-LSTM Network for Multi-Label Aerial Scene Classification

Label-Guided Cross-Modal Attention Network for Multi-Label Aerial Image Classification