Abstract:The manual classification of primary brain tumors through Magnetic Resonance Imaging (MRI) is considered as a critical task during the clinical routines that requires highly qualified neuroradiologists. Deep Learning (DL)-based computer-aided diagnosis tools are established to support the neurosurgeons' opinion during the diagnosis. However, the black-box nature and the lack of transparency and interpretability of such DL-based models make their implementation, especially in critical and sensitive medical applications, very difficult. The explainable artificial intelligence techniques help to gain clinicians' confidence and to provide explanations about the models' predictions. Typical and existing Convolutional Neural Network (CNN)-based architectures could not capture long-range global information and feature from pathology MRI scans. Recently, Vision Transformer (ViT) networks have been introduced to solve the issue of long-range dependency in CNN-based architecture by introducing a self-attention mechanism to analyze images, allowing the network to capture deep long-range reliance between pixels. The purpose of the proposed study is to provide efficient CAD tool for MRI brain tumor classification. At the same, we aim to enhance the neuroradiologists' confidence when using DL in clinical and medical standards. In this paper, we investigated a deep ViT architecture trained from scratch for the multi-classification task of common primary tumors (gliomas, meningiomas, and pituitary brain tumors), using T1-weighted contrast-enhanced MRI sequences. Several XAI techniques have been adopted: Gradient-weighted Class Activation Mapping (Grad-CAM), Local Interpretable Model-agnostic Explanations (LIME), and SHapley Additive exPlanations (SHAP), to visualize the most significant and distinguishing features related to the model prediction results. A publicly available benchmark dataset has been used for the evaluation task. The comparative study confirms the efficiency of ViT architecture compared to the CNN model using the testing dataset. The test accuracy of 83.37% for the Convolutional Neural Network (CNN) and 91.61% for the Vision Transformer (ViT) indicates that the ViT model outperformed the CNN model in the classification task. Based on the experimental results, we could confirm that the proposed ViT model presents a competitive performance outperforming the multi-classification state-of-the-art models using MRI sequences. Further, the proposed models present an exact and correct interpretation. Thus, we could confirm that the proposed CAD could be established during the clinical diagnosis routines.

Video and Synthetic MRI Pre-training of 3D Vision Architectures for Neuroimage Analysis

Efficiently Training Vision Transformers on Structural MRI Scans for Alzheimer's Disease Detection

Video Pretraining Advances 3D Deep Learning on Chest CT Tasks

Using Vision Transformers in 3-D Medical Image Classifications

Introducing Vision Transformer for Alzheimer's Disease classification task with 3D input

Pretrained ViTs Yield Versatile Representations For Medical Images

BrainMVP: Multi-modal Vision Pre-training for Brain Image Analysis using Multi-parametric MRI

Video4MRI: An Empirical Study on Brain Magnetic Resonance Image Analytics with CNN-based Video Classification Frameworks

Vision transformer-equipped Convolutional Neural Networks for automated Alzheimer's disease diagnosis using 3D MRI scans

Medical Transformer: Universal Brain Encoder for 3D MRI Analysis

Vision Transformers and Bi-LSTM for Alzheimer's Disease Diagnosis from 3D MRI

Joint transformer architecture in brain 3D MRI classification: its application in Alzheimer's disease classification

Transferring Models Trained on Natural Images to 3D MRI via Position Encoded Slice Models

Vision Mamba: Cutting-Edge Classification of Alzheimer's Disease with 3D MRI Scans

Domain Aware Multi-Task Pretraining of 3D Swin Transformer for T1-weighted Brain MRI

Vision transformers (ViT) and deep convolutional neural network (D-CNN)-based models for MRI brain primary tumors images multi-classification supported by explainable artificial intelligence (XAI)

Transfer Learning with intelligent training data selection for prediction of Alzheimer's Disease

ViTAD: Leveraging Modified Vision Transformer for Alzheimer's Disease Multi-Stage Classification from Brain MRI Scans

Medical Vision-Language Pre-Training for Brain Abnormalities

MoViT: Memorizing Vision Transformers for Medical Image Analysis

Explainable early detection of Alzheimer's disease using ROIs and an ensemble of 138 3D vision transformers