Abstract:Data in the form of images are now generated at an unprecedented rate. A case in point is remote sensing images (RSI), now available in large-scale RSI archives, which have attracted a considerable amount of research on image classification within the remote sensing community. The basic task of single-target multi-class image classification considers the case where each image is assigned exactly one label from a predefined finite set of class labels. Recently, however, image annotations have become increasingly complex, with images labeled with several labels (instead of just one). In other words, the goal is to assign multiple semantic categories to an image, based on its high-level context. The corresponding machine learning tasks is called multi-label classification (MLC). The classification of RSI is currently predominantly addressed by deep neural network (DNN) approaches, especially convolutional neural networks (CNNs), which can be utilized as feature extractors as well as end-to-end methods. After only considering single-target classification for a long period, DNNs have recently emerged that address the task of MLC. On the other hand, trees and tree ensembles for MLC have a long tradition and are the best-performing class of MLC methods, but need predefined feature representations to operate on. In this work, we explore different strategies for model training based on the transfer learning paradigm, where we utilize different families of (pre-trained) CNN architectures, such as VGG, EfficientNet, and ResNet. The architectures are trained in an end-to-end manner and used in two different modes of operation, namely, as standalone models that directly perform the MLC task, and as feature extractors. In the latter case, the learned representations are used with tree ensemble methods for MLC, such as random forests and extremely randomized trees. We conduct an extensive experimental analysis of methods over several publicly available RSI datasets and evaluate their effectiveness in terms of standard MLC measures. Of these, ranking-based evaluation measures are most relevant, especially ranking loss. The results show that, for addressing the RSI-MLC task, it is favorable to use lightweight network architectures, such as EfficientNet-B2, which is the best performing end-to-end approach, as well as a feature extractor. Furthermore, in the datasets with a limited number of images, using traditional tree ensembles for MLC can yield better performance compared to end-to-end deep approaches.

Dual-stream multi-label image classification model enhanced by feature reconstruction

Deep Dual-Stream Network with Scale Context Selection Attention Module for Semantic Segmentation

Dual Enhancement for Multi-Label Learning with Missing Labels

Multiscale 3-D-2-D Mixed CNN and Lightweight Attention-Free Transformer for Hyperspectral and LiDAR Classification

A multi-label image classification method combining multi-stage image semantic information and label relevance

Dual-Branch Feature Fusion Network Based Cross-Modal Enhanced CNN and Transformer for Hyperspectral and LiDAR Classification

DSDCLNet: Dual-stream encoder and dual-level contrastive learning network for supervised multivariate time series classification

MSFA: Multi‐stage feature aggregation network for multi‐label image recognition

Multi-scale and Discriminative Part Detectors Based Features for Multi-label Image Classification.

Asymmetric Vision Transformers for Multi-Label Classification

Deep Network Architectures as Feature Extractors for Multi-Label Classification of Remote Sensing Images

MULTI-LABEL IMAGE RECOGNITION WITH JOINT CLASS-AWARE MAP DISENTANGLING AND LABEL CORRELATION EMBEDDING

Semantic and Correlation Disentangled Graph Convolutions for Multilabel Image Recognition.

Deep dual incomplete multi-view multi-label classification via label semantic-guided contrastive learning

Image Reconstruction of Multi Branch Feature Multiplexing Fusion Network with Mixed Multi-layer Attention

Multilabel Convolutional Network With Feature Denoising and Details Supplement

Transformer-based Multi-Modal Learning for Multi Label Remote Sensing Image Classification

Multi-Label Remote Sensing Image Scene Classification by Combining a Convolutional Neural Network and a Graph Neural Network

A Lightweight Multi-Scale Channel Attention Network for Image Super-Resolution.

Disentangling, Embedding and Ranking Label Cues for Multi-Label Image Recognition

CDDFuse: Correlation-Driven Dual-Branch Feature Decomposition for Multi-Modality Image Fusion