Abstract:Multi-label image classification is a fundamental and vital task in computer vision. The latest methods are mostly based on deep learning and exhibit excellent performance in understanding images. However, in previous studies, only capture the image content information has been captured using convolutional neural networks (CNNs), and the semantic structure information and implicit dependencies between labels and image regions have been ignored. Therefore, it is necessary to develop more effective methods for integrating semantic information and visual features in multi-label image classification. In this study, we propose a novel framework for multi-label image classification, named FLNet, which simultaneously takes advantage of the visual features and semantic structure. Specifically, to enhance the association between semantic annotations and image regions, we first integrate the attention mechanism with a CNN to focus on the target regions while ignoring other useless surrounding information and then employ graph convolutional network (GCN) to capture the structure information between multiple labels. Based on our architecture, we also introduce the lateral connections to repeatedly inject the label system into the CNN backbone during the GCN learning process to improve performance and, consequently, learn interdependent classifiers for each image label. We apply our method to multi-label image classification. The experiments on two public multi-label benchmark datasets, namely, MS-COCO and PASCAL visual object classes challenge (VOC 2007), demonstrate that our approach outperforms other existing state-of-the-art methods. Our method learns specific target regions and enhances the association between labels and image regions by using semantic information and attention mechanism. Thus, we combine the advantages of both visual and semantic information to further improve the image classification performance. Finally, the correctness and effectiveness of the proposed method are proven by visualizing the classifier results.

Semantic Embedded Deep Neural Network: A Generic Approach to Boost Multi-Label Image Classification Performance

Semantic-Guided Representation Enhancement for Multi-Label Image Classification

Deep Semantic Dictionary Learning for Multi-label Image Classification

Multi-label learning with semantic embeddings

An Attention-Driven Multi-label Image Classification with Semantic Embedding and Graph Convolutional Networks.

Spatial-Context-Aware Deep Neural Network for Multi-Class Image Classification

Multi-layered Semantic Representation Network for Multi-label Image Classification

Semantic Supplementary Network With Prior Information for Multi-Label Image Classification

Two-Stage Label Embedding Via Neural Factorization Machine for Multi-Label Classification

Spatial Context-Aware Object-Attentional Network for Multi-Label Image Classification

Attend and Imagine: Multi-Label Image Classification with Visual Attention and Recurrent Neural Networks

Bi-Modal Learning with Channel-Wise Attention for Multi-Label Image Classification

Multiple Semantic Embedding with Graph Convolutional Networks for Multi-Label Image Classification.

Multi-label image annotation based on multi-model

Multi-branch Prediction Network for Multi-label Social Image Classification.

Semantic Image Segmentation Via Guidance of Image Classification

SLED: Semantic Label Embedding Dictionary Representation for Multi-label Image Annotation

Deep Learning for Multilabel Remote Sensing Image Annotation with Dual-Level Semantic Concepts

A multi-label image classification method combining multi-stage image semantic information and label relevance

Transformer-Driven Semantic Relation Inference for Multilabel Classification of High-Resolution Remote Sensing Images

A Deep Modular Label Attention Network for Multi-label Text Classification