Abstract:In recent years, the growth spurt of medical imaging data has led to the development of various machine learning algorithms for various healthcare applications. The MedMNISTv2 dataset, a comprehensive benchmark for 2D biomedical image classification, encompasses diverse medical imaging modalities such as Fundus Camera, Breast Ultrasound, Colon Pathology, Blood Cell Microscope etc. Highly accurate classifications performed on these datasets is crucial for identification of various diseases and determining the course of treatment. This research paper presents a comprehensive analysis of four subsets within the MedMNISTv2 dataset: BloodMNIST, BreastMNIST, PathMNIST and RetinaMNIST. Each of these selected datasets is of diverse data modalities and comes with various sample sizes, and have been selected to analyze the efficiency of the model against diverse data modalities. The study explores the idea of assessing the Vision Transformer Model's ability to capture intricate patterns and features crucial for these medical image classification and thereby transcend the benchmark metrics substantially. The methodology includes pre-processing the input images which is followed by training the ViT-base-patch16-224 model on the mentioned datasets. The performance of the model is assessed using key metrices and by comparing the classification accuracies achieved with the benchmark accuracies. With the assistance of ViT, the new benchmarks achieved for BloodMNIST, BreastMNIST, PathMNIST and RetinaMNIST are 97.90%, 90.38%, 94.62% and 57%, respectively. The study highlights the promise of Vision transformer models in medical image analysis, preparing the way for their adoption and further exploration in healthcare applications, aiming to enhance diagnostic accuracy and assist medical professionals in clinical decision-making.

What problem does this paper attempt to address?

The paper aims to address the following key issues: ### Research Background and Objectives - **Problem Definition**: In medical image classification tasks, using machine learning algorithms to assist medical diagnosis, improve diagnostic accuracy, and reduce the occurrence of misdiagnosis and false-positive cases. - **Specific Objectives**: - Evaluate the ability of the Vision Transformer (ViT) model to capture key features and patterns in complex medical images. - Conduct a detailed analysis of four subsets in the MedMNISTv2 dataset (BloodMNIST, BreastMNIST, PathMNIST, and RetinaMNIST). - Classify these four datasets using the ViT model and compare with benchmark accuracy to surpass existing performance metrics. - Determine the effectiveness of the model through other evaluation metrics. ### Methodology - **Dataset**: The study uses the MedMNISTv2 dataset, a comprehensive benchmark dataset containing various medical imaging modalities (such as fundus camera images, breast ultrasound, colon pathology, and blood cell microscopy). These images are preprocessed and standardized to 28×28 pixels. - **Model Selection**: The paper employs the Vision Transformer (ViT) model, specifically the pre-trained ViT-Base-Patch16-224 model. This model is pre-trained on ImageNet-21k and fine-tuned on the ImageNet 2012 dataset. - **Experimental Setup**: The paper performs specific preprocessing steps for each dataset, including conversion to RGB format, resizing to 224×224 pixels, and normalizing pixel values. Then, the pre-trained model is used to classify the images, with necessary fine-tuning. - **Evaluation Metrics**: In addition to accuracy, metrics such as F1 score, precision, and recall are used to comprehensively evaluate the model's performance. ### Main Contributions - The first evaluation of the ViT-Base-Patch16-224 model in capturing the key features and patterns required for medical image classification. - Provides a comprehensive analysis of four important datasets in MedMNISTv2. - Demonstrates the superior performance of the ViT model on these datasets through experimental results and compares it with existing benchmarks. - Showcases the potential of the ViT model in the field of medical image analysis, providing a more accurate and reliable image classification system for medical applications. In summary, the goal of this paper is to improve the accuracy and reliability of medical image classification using the Vision Transformer model, thereby supporting the clinical decision-making process.

Implementing vision transformer for classifying 2D biomedical images

Vision Transformer for Efficient Chest X-ray and Gastrointestinal Image Classification

Vision Transformers in Medical Computer Vision -- A Contemplative Retrospection

Pretrained ViTs Yield Versatile Representations For Medical Images

Vision transformer-convolution for breast cancer classification using mammography images: A comparative study

A Novel Vision Transformer with Residual in Self-attention for Biomedical Image Classification

MetaV: A Pioneer in feature Augmented Meta-Learning Based Vision Transformer for Medical Image Classification

Advances in Medical Image Analysis with Vision Transformers: A Comprehensive Review

Towards efficient diagnostics: refining vision transformers for medical image multi-label classification

MedViT: A robust vision transformer for generalized medical image classification

DCT-HistoTransformer: Efficient Lightweight Vision Transformer with DCT Integration for histopathological image analysis

Skin Cancer Segmentation and Classification Using Vision Transformer for Automatic Analysis in Dermatoscopy-Based Noninvasive Digital System

High-Performance Classification of Breast Cancer Histopathological Images Using Fine-Tuned Vision Transformers on the BreakHis Dataset

Vision transformer introduces a new vitality to the classification of renal pathology

Investigating the Robustness of Vision Transformers against Label Noise in Medical Image Classification

Automated retinal disease classification using hybrid transformer model (SViT) using optical coherence tomography images

Automated classification of choroidal neovascularization, diabetic macular edema, and drusen from retinal OCT images using vision transformers: a comparative study

HViT: Hybrid vision inspired transformer for the assessment of carotid artery plaque by addressing the cross-modality domain adaptation problem in MRI

Echoes of images: multi-loss network for image retrieval in vision transformers

VITALT: a robust and efficient brain tumor detection system using vision transformer with attention and linear transformation

Automated Ischemic Stroke Classification from MRI Scans: Using a Vision Transformer Approach