Abstract:Multilabel image classification aims to assign images to multiple possible labels. In this task, each image may be associated with multiple labels, making it more challenging than the single-label classification problems. For instance, convolutional neural networks (CNNs) have not met the performance requirement in utilizing statistical dependencies between labels in this study. Additionally, data imbalance is a common problem in machine learning that needs to be considered for multilabel medical image classification. Furthermore, the concatenation of a CNN and a transformer suffers from the disadvantage of lacking direct interaction and information exchange between the two models. To address these issues, we propose a novel hybrid deep learning model called CTransCNN. This model comprises three main components in both the CNN and transformer branches: a multilabel multihead attention enhanced feature module (MMAEF), a multibranch residual module (MBR), and an information interaction module (IIM). The MMAEF enables the exploration of implicit correlations between labels, the MBR facilitates model optimization, and the IIM enhances feature transmission and increases nonlinearity between the two branches to help accomplish the multilabel medical image classification task. We evaluated our approach using publicly available datasets, namely the ChestX-ray11 and NIH ChestX-ray14, along with our self-constructed traditional Chinese medicine tongue dataset (TCMTD). Extensive multilabel image classification experiments were conducted comparing our approach with excellent methods. The experimental results demonstrate that the framework we have developed exhibits strong competitiveness compared to previous research. Its robust generalization ability makes it applicable to other medical multilabel image classification tasks.

Distance Restricted Transformer Encoder for Multi-Label Classification

Region-Awared Transformer with Asymmetric Loss in Multi-Label Classification

Asymmetric Vision Transformers for Multi-Label Classification

Multiscale 3-D-2-D Mixed CNN and Lightweight Attention-Free Transformer for Hyperspectral and LiDAR Classification

Query2Label: A Simple Transformer Way to Multi-Label Classification

Multi-label classification of retinal disease via a novel vision transformer model

HSVLT: Hierarchical Scale-Aware Vision-Language Transformer for Multi-Label Image Classification

SST: Spatial and Semantic Transformers for Multi-Label Image Recognition

Cross-Domain Hyperspectral Image Classification Based on Transformer

Diverse Instance Discovery: Vision-Transformer for Instance-Aware Multi-Label Image Recognition.

Diverse Instance Discovery: Vision-Transformer for Instance-Aware Multi-Label Image Recognition

GA-Based Weighted Ensemble Learning for Multi-Label Aerial Image Classification Using Convolutional Neural Networks and Vision Transformers

Graph Attention Transformer Network for Multi-label Image Classification

A multimodal hyper-fusion transformer for remote sensing image classification

ClassFormer: Exploring Class-Aware Dependency with Transformer for Medical Image Segmentation

Learning transformer-based heterogeneously salient graph representation for multimodal remote sensing image classification

CTransCNN: Combining transformer and CNN in multilabel medical image classification

Multi-Label Retinal Disease Classification using Transformers

Transformer-based Multi-Modal Learning for Multi Label Remote Sensing Image Classification

Dual-stream multi-label image classification model enhanced by feature reconstruction

Discriminative Vision Transformer for Heterogeneous Cross-Domain Hyperspectral Image Classification