Investigating the Robustness of Vision Transformers against Label Noise in Medical Image Classification

Bidur Khanal,Prashant Shrestha,Sanskar Amgain,Bishesh Khanal,Binod Bhattarai,Cristian A. Linte

2024-02-27

Abstract:Label noise in medical image classification datasets significantly hampers the training of supervised deep learning methods, undermining their generalizability. The test performance of a model tends to decrease as the label noise rate increases. Over recent years, several methods have been proposed to mitigate the impact of label noise in medical image classification and enhance the robustness of the model. Predominantly, these works have employed CNN-based architectures as the backbone of their classifiers for feature extraction. However, in recent years, Vision Transformer (ViT)-based backbones have replaced CNNs, demonstrating improved performance and a greater ability to learn more generalizable features, especially when the dataset is large. Nevertheless, no prior work has rigorously investigated how transformer-based backbones handle the impact of label noise in medical image classification. In this paper, we investigate the architectural robustness of ViT against label noise and compare it to that of CNNs. We use two medical image classification datasets -- COVID-DU-Ex, and NCT-CRC-HE-100K -- both corrupted by injecting label noise at various rates. Additionally, we show that pretraining is crucial for ensuring ViT's improved robustness against label noise in supervised training.

Image and Video Processing,Computer Vision and Pattern Recognition,Machine Learning

What problem does this paper attempt to address?

The paper aims to explore the robustness of Vision Transformers (ViT) in medical image classification in the presence of label noise and compare it with traditional Convolutional Neural Networks (CNNs). Specifically, the researchers focus on the following aspects: 1. **Impact of Label Noise**: The researchers point out that label noise in medical image classification datasets can severely affect the training effectiveness of supervised learning methods, thereby weakening the model's generalization ability. As the label noise rate increases, the model's test performance usually declines. 2. **Comparison between ViT and CNN**: Although ViT has shown excellent performance in many benchmarks in recent years, there is currently a lack of research on ViT as a backbone network in handling label noise. Therefore, this paper experimentally compares the performance of ViT and CNN (represented by ResNet18) under different label noise rates. 3. **Role of Self-Supervised Pre-Training**: The study finds that self-supervised pre-training is crucial for improving the robustness of ViT in environments with label noise. By using two self-supervised pre-training methods—Masked Autoencoders (MAE) and SimMIM, the performance of ViT in high label noise situations can be significantly enhanced. 4. **Application of Co-Teaching Method**: The researchers also explore the effect of applying the Co-teaching label noise learning method to ViT. The results show that for ViT without pre-training, the effect of Co-teaching is not as good as ResNet18; however, after pre-training, the performance of ViT is significantly better than that of the untrained model. In summary, the core objective of this paper is to evaluate the robustness of ViT in handling the label noise problem in medical image classification tasks and to experimentally demonstrate that appropriate self-supervised pre-training can significantly improve the performance of ViT in the presence of label noise.

Investigating the Robustness of Vision Transformers against Label Noise in Medical Image Classification

Evaluating and enhancing the robustness of vision transformers against adversarial attacks in medical imaging

Towards efficient diagnostics: refining vision transformers for medical image multi-label classification

Implementing vision transformer for classifying 2D biomedical images

Vision Transformer for Efficient Chest X-ray and Gastrointestinal Image Classification

MedViT: A robust vision transformer for generalized medical image classification

Pure Vision Transformer (CT-ViT) with Noise2Neighbors Interpolation for Low-Dose CT Image Denoising

A Comparative Study of CNN, ResNet, and Vision Transformers for Multi-Classification of Chest Diseases

Pretrained ViTs Yield Versatile Representations For Medical Images

Data-Efficient Vision Transformers for Multi-Label Disease Classification on Chest Radiographs

Robust prostate disease classification using transformers with discrete representations

A New Perspective to Boost Vision Transformer for Medical Image Classification

Vision transformers (ViT) and deep convolutional neural network (D-CNN)-based models for MRI brain primary tumors images multi-classification supported by explainable artificial intelligence (XAI)

Echoes of images: multi-loss network for image retrieval in vision transformers

On the Adversarial Robustness of Vision Transformers

Reveal of Vision Transformers Robustness against Adversarial Attacks

Is it Time to Replace CNNs with Transformers for Medical Images?

Evaluating Robustness of Vision Transformers on Imbalanced Datasets (Student Abstract)

Multi-label classification of retinal disease via a novel vision transformer model

Comparison of Vision Transformers and Convolutional Neural Networks in Medical Image Analysis: A Systematic Review

ViT-V-Net: Vision Transformer for Unsupervised Volumetric Medical Image Registration