Abstract:Grading laryngeal squamous cell carcinoma (LSCC) based on histopathological images is a clinically significant yet challenging task. However, more low-effect background semantic information appeared in the feature maps, feature channels, and class activation maps, which caused a serious impact on the accuracy and interpretability of LSCC grading. While the traditional transformer block makes extensive use of parameter attention, the model overlearns the low-effect background semantic information, resulting in ineffectively reducing the proportion of background semantics. Therefore, we propose an end-to-end network with transformers constrained by learned-parameter-free attention (LA-ViT), which improve the ability to learn high-effect target semantic information and reduce the proportion of background semantics. Firstly, according to generalized linear model and probabilistic, we demonstrate that learned-parameter-free attention (LA) has a stronger ability to learn highly effective target semantic information than parameter attention. Secondly, the first-type LA transformer block of LA-ViT utilizes the feature map position subspace to realize the query. Then, it uses the feature channel subspace to realize the key, and adopts the average convergence to obtain a value. And those construct the LA mechanism. Thus, it reduces the proportion of background semantics in the feature maps and feature channels. Thirdly, the second-type LA transformer block of LA-ViT uses the model probability matrix information and decision level weight information to realize key and query, respectively. And those realize the LA mechanism. So, it reduces the proportion of background semantics in class activation maps. Finally, we build a new complex semantic LSCC pathology image dataset to address the problem, which is less research on LSCC grading models because of lacking clinically meaningful datasets. After extensive experiments, the whole metrics of LA-ViT outperform those of other state-of-the-art methods, and the visualization maps match better with the regions of interest in the pathologists' decision-making. Moreover, the experimental results conducted on a public LSCC pathology image dataset show that LA-ViT has superior generalization performance to that of other state-of-the-art methods.

LGVIT: Local-Global Vision Transformer for Breast Cancer Histopathological Image Classification

A VGG Attention Vision Transformer Network for Benign and Malignant Classification of Breast Ultrasound Images.

Unified Local and Global Attention Interaction Modeling for Vision Transformers

CViTS-Net: A CNN-ViT Network With Skip Connections for Histopathology Image Classification

Vision Transformer for Classification of Breast Ultrasound Images

Local-to-Global Self-Attention in Vision Transformers

Vision transformer-convolution for breast cancer classification using mammography images: A comparative study

Global-Local Attention Network for Weakly Supervised Cervical Cytology ROI Analysis.

LA-ViT: A Network With Transformers Constrained by Learned-Parameter-Free Attention for Interpretable Grading in a New Laryngeal Histopathology Image Dataset

Semi-supervised vision transformer with adaptive token sampling for breast cancer classification

Supervised Contrastive Vision Transformer for Breast Histopathological Image Classification

Histopathological Image Classification based on Self-Supervised Vision Transformer and Weak Labels

AResNet-ViT: A Hybrid CNN-Transformer Network for Benign and Malignant Breast Nodule Classification in Ultrasound Images

Convolutional Neural Network–Vision Transformer Architecture with Gated Control Mechanism and Multi-Scale Fusion for Enhanced Pulmonary Disease Classification

CB-HVTNet: A channel-boosted hybrid vision transformer network for lymphocyte assessment in histopathological images

LC2R-ViT: Long-Range Cross-Residual Vision Transformer for Medical Image Classification

PLG-ViT: Vision Transformer with Parallel Local and Global Self-Attention

Cross-Attention Based Multi-Scale Feature Fusion Vision Transformer for Breast Ultrasound Image Classification.

Multi-branch CNN and grouping cascade attention for medical image classification

RegionViT: Regional-to-Local Attention for Vision Transformers

A Multi-Task Transformer with Local-Global Feature Interaction and Multiple Tumoral Region Guidance for Breast Cancer Diagnosis