Abstract:Background: Medical image classification is crucial for accurate and efficient diagnosis, and deep learning frameworks have shown significant potential in this area. When a general learning deep model is directly deployed to a new dataset with heterogeneous features, the effect of domain shifts is usually ignored, which degrades the performance of deep learning models and leads to inaccurate predictions. Purpose: This study aims to propose a framework that utilized the cross-modality domain adaptation and accurately diagnose and classify MRI scans and domain knowledge into stable and vulnerable plaque categories by a modified Vision Transformer (ViT) model for the classification of MRI scans and transformer model for domain knowledge classification. Methods: This study proposes a Hybrid Vision Inspired Transformer (HViT) framework that employs a convolutional layer for image pre-processing and normalization and a 3D convolutional layer to enable ViT to classify 3D images. Our proposed HViT framework introduces a slim design with a multi-branch network and channel attention, improving patch embedding extraction and information learning. Auxiliary losses target shallow features, linking them with deeper ones, enhancing information gain, and model generalization. Furthermore, replacing the MLP Head with RNN enables better backpropagation for improved performance. Moreover, we utilized a modified transformer model with LSTM positional encoding and Golve word vector to classify domain knowledge. By using ensemble learning techniques, specifically stacking ensemble learning with hard and soft prediction, we combine the predictive power of both models to address the cross-modality domain adaptation problem and improve overall performance. Results: The proposed framework achieved an accuracy of 94.32% for carotid artery plaque classification into stable and vulnerable plaque by addressing the cross-modality domain adaptation problem and improving overall performance. Conclusion: The model was further evaluated using an independent dataset acquired from different hardware protocols. The results demonstrate that the proposed deep learning model significantly improves the generalization ability across different MRI scans acquired from different hardware protocols without requiring additional calibration data.

Fourier ViT: A Multi-scale Vision Transformer with Fourier Transform for Histopathological Image Classification

Vision Transformers for Computational Histopathology

DCT-HistoTransformer: Efficient Lightweight Vision Transformer with DCT Integration for histopathological image analysis

Vision transformer introduces a new vitality to the classification of renal pathology

CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification

Towards efficient diagnostics: refining vision transformers for medical image multi-label classification

Implementing vision transformer for classifying 2D biomedical images

HViT: Hybrid vision inspired transformer for the assessment of carotid artery plaque by addressing the cross-modality domain adaptation problem in MRI

A Novel Vision Transformer with Residual in Self-attention for Biomedical Image Classification

Vision transformer-based weakly supervised histopathological image analysis of primary brain tumors

MC-ViT: Multi-path cross-scale vision transformer for thymoma histopathology whole slide image typing

MIL-ViT: A Multiple Instance Vision Transformer for Fundus Image Classification

Vision transformer-convolution for breast cancer classification using mammography images: A comparative study

Vision Transformer for Classification of Breast Ultrasound Images

Vision Transformers for Small Histological Datasets Learned through Knowledge Distillation

High-Performance Classification of Breast Cancer Histopathological Images Using Fine-Tuned Vision Transformers on the BreakHis Dataset

Deep Hierarchical Vision Transformer for Hyperspectral and LiDAR Data Classification

Fractional Fourier Image Transformer for Multimodal Remote Sensing Data Classification

MedViT: A robust vision transformer for generalized medical image classification

Hierarchical Vision Transformers for Context-Aware Prostate Cancer Grading in Whole Slide Images

Improving Vision Transformers by Revisiting High-Frequency Components