Abstract:The manual classification of primary brain tumors through Magnetic Resonance Imaging (MRI) is considered as a critical task during the clinical routines that requires highly qualified neuroradiologists. Deep Learning (DL)-based computer-aided diagnosis tools are established to support the neurosurgeons' opinion during the diagnosis. However, the black-box nature and the lack of transparency and interpretability of such DL-based models make their implementation, especially in critical and sensitive medical applications, very difficult. The explainable artificial intelligence techniques help to gain clinicians' confidence and to provide explanations about the models' predictions. Typical and existing Convolutional Neural Network (CNN)-based architectures could not capture long-range global information and feature from pathology MRI scans. Recently, Vision Transformer (ViT) networks have been introduced to solve the issue of long-range dependency in CNN-based architecture by introducing a self-attention mechanism to analyze images, allowing the network to capture deep long-range reliance between pixels. The purpose of the proposed study is to provide efficient CAD tool for MRI brain tumor classification. At the same, we aim to enhance the neuroradiologists' confidence when using DL in clinical and medical standards. In this paper, we investigated a deep ViT architecture trained from scratch for the multi-classification task of common primary tumors (gliomas, meningiomas, and pituitary brain tumors), using T1-weighted contrast-enhanced MRI sequences. Several XAI techniques have been adopted: Gradient-weighted Class Activation Mapping (Grad-CAM), Local Interpretable Model-agnostic Explanations (LIME), and SHapley Additive exPlanations (SHAP), to visualize the most significant and distinguishing features related to the model prediction results. A publicly available benchmark dataset has been used for the evaluation task. The comparative study confirms the efficiency of ViT architecture compared to the CNN model using the testing dataset. The test accuracy of 83.37% for the Convolutional Neural Network (CNN) and 91.61% for the Vision Transformer (ViT) indicates that the ViT model outperformed the CNN model in the classification task. Based on the experimental results, we could confirm that the proposed ViT model presents a competitive performance outperforming the multi-classification state-of-the-art models using MRI sequences. Further, the proposed models present an exact and correct interpretation. Thus, we could confirm that the proposed CAD could be established during the clinical diagnosis routines.

Leveraging Pretrained Vision Transformers for Automated Cancer Diagnosis in Optical Coherence Tomography Images

Automated retinal disease classification using hybrid transformer model (SViT) using optical coherence tomography images

Deep-Learning-Based Automated Identification and Visualization of Oral Cancer in Optical Coherence Tomography Images

Exploring the Power of Deep Learning: Fine-Tuned Vision Transformer for Accurate and Efficient Brain Tumor Detection in MRI Scans

Cross-Attention Based Multi-Resolution Feature Fusion Model for Self-Supervised Cervical OCT Image Classification

Pathological Insights: Enhanced Vision Transformers for the Early Detection of Colorectal Cancer

ViT-CB: Integrating hybrid Vision Transformer and CatBoost to enhanced brain tumor detection with SHAP

Exploring vision transformers and XGBoost as deep learning ensembles for transforming carcinoma recognition

Detection of oral squamous cell carcinoma in clinical photographs using a vision transformer

DCT-HistoTransformer: Efficient Lightweight Vision Transformer with DCT Integration for histopathological image analysis

Vision transformer-convolution for breast cancer classification using mammography images: A comparative study

Classification of Mobile-Based Oral Cancer Images Using the Vision Transformer and the Swin Transformer

Human colorectal cancer tissue assessment using optical coherence tomography catheter and deep learning

An Innovative Solution Based on TSCA-ViT for Osteosarcoma Diagnosis in Resource-Limited Settings

Ophthalmic Biomarker Detection with Parallel Prediction of Transformer and Convolutional Architecture

Vision transformers (ViT) and deep convolutional neural network (D-CNN)-based models for MRI brain primary tumors images multi-classification supported by explainable artificial intelligence (XAI)

Deep Learning With Optical Coherence Tomography for Melanoma Identification and Risk Prediction

Vision Transformer for Efficient Chest X-ray and Gastrointestinal Image Classification

Head and Neck Cancer Segmentation in FDG PET Images: Performance Comparison of Convolutional Neural Networks and Vision Transformers

Computer Vision Foundation Models in Endoscopy: Proof of Concept in Oropharyngeal Cancer

Ovarian cancer detection using optical coherence tomography and convolutional neural networks