Abstract:Objectives Accurate histological typing plays an important role in diagnosing thymoma or thymic carcinoma (TC) and predicting the corresponding prognosis. In this paper, we develop and validate a deep learning-based thymoma typing method for hematoxylin & eosin (H&E)-stained whole slide images (WSIs), which provides useful histopathology information from patients to assist doctors for better diagnosing thymoma or TC. Methods We propose a multi-path cross-scale vision transformer (MC-ViT), which first uses the cross attentive scale-aware transformer (CAST) to classify the pathological information related to thymoma, and then uses such pathological information priors to assist the WSIs transformer (WT) for thymoma typing. To make full use of the multi-scale (10×, 20×, and 40×) information inherent in a WSI, CAST not only employs parallel multi-path to capture different receptive field features from multi-scale WSI inputs, but also introduces the cross-correlation attention module (CAM) to aggregate multi-scale features to achieve cross-scale spatial information complementarity. After that, WT can effectively convert full-scale WSIs into 1D feature matrices with pathological information labels to improve the efficiency and accuracy of thymoma typing. Results We construct a large-scale thymoma histopathology WSI (THW) dataset and annotate corresponding pathological information and thymoma typing labels. The proposed MC-ViT achieves the Top-1 accuracy of 0.939 and 0.951 in pathological information classification and thymoma typing, respectively. Moreover, the quantitative and statistical experiments on the THW dataset also demonstrate that our pipeline performs favorably against the existing classical convolutional neural networks, vision transformers, and deep learning-based medical image classification methods. Conclusion This paper demonstrates that comprehensively utilizing the pathological information contained in multi-scale WSIs is feasible for thymoma typing and achieves clinically acceptable performance. Specifically, the proposed MC-ViT can well predict pathological information classes as well as thymoma types, which show the application potential to the diagnosis of thymoma and TC and may assist doctors in improving diagnosis efficiency and accuracy.

CViTS-Net: A CNN-ViT Network With Skip Connections for Histopathology Image Classification

Artificial Classification of Cervical Squamous Lesions in ThinPrep Cytologic Tests Using a Deep Convolutional Neural Network.

AResNet-ViT: A Hybrid CNN-Transformer Network for Benign and Malignant Breast Nodule Classification in Ultrasound Images

Vision transformer introduces a new vitality to the classification of renal pathology

DCT-HistoTransformer: Efficient Lightweight Vision Transformer with DCT Integration for histopathological image analysis

MMViT-Seg: A Lightweight Transformer and CNN Fusion Network for COVID-19 Segmentation.

From modern CNNs to vision transformers: Assessing the performance, robustness, and classification strategies of deep learning models in histopathology

CB-HVTNet: A channel-boosted hybrid vision transformer network for lymphocyte assessment in histopathological images

MC-ViT: Multi-path cross-scale vision transformer for thymoma histopathology whole slide image typing

ASI-DBNet: An Adaptive Sparse Interactive ResNet-Vision Transformer Dual-Branch Network for the Grading of Brain Cancer Histopathological Images

Large Scale Tissue Histopathology Image Classification, Segmentation, and Visualization Via Deep Convolutional Activation Features

Vision Transformer for Efficient Chest X-ray and Gastrointestinal Image Classification

Deep Transfer Learning for Histopathological Diagnosis of Cervical Cancer Using Convolutional Neural Networks with Visualization Schemes

Vision transformer-convolution for breast cancer classification using mammography images: A comparative study

Enhancing cervical cancer diagnosis: Integrated attention-transformer system with weakly supervised learning

Automated retinal disease classification using hybrid transformer model (SViT) using optical coherence tomography images

An improved transformer network for skin cancer classification

Hierarchical Deep Convolutional Neural Networks for Multi-category Diagnosis of Gastrointestinal Disorders on Histopathological Images

Histopathological image classification with deep convolutional neural networks

ViT-CB: Integrating hybrid Vision Transformer and CatBoost to enhanced brain tumor detection with SHAP

High-Performance Classification of Breast Cancer Histopathological Images Using Fine-Tuned Vision Transformers on the BreakHis Dataset