LGVIT: Local-Global Vision Transformer for Breast Cancer Histopathological Image Classification

Lang Wang,Juan Liu,Peng Jiang,Dehua Cao,Baochuan Pang
DOI: https://doi.org/10.1109/icassp49357.2023.10096781
2023-01-01
Abstract:Breast cancer histopathological image classification has made great progress with the use of Convolutional Neural Networks (CNNs). However, due to the limited receptive field, CNNs have difficulty in learning the global information of breast cancer histopathological images, hindering the further improvement of this task. To solve this problem, we reasonably apply self-attention mechanism to this task and propose a new network called Local-Global Vision Transformer (LGViT) which utilizes CNNs to capture local features and self-attention mechanism to learn global features of histopathological images. LGViT has several advantages: (1) We propose Local-Global Multi-head Self-attention, a new mechanism that models long-range dependencies with low computational cost. In this mechanism, self-attention is first performed separately within each window. Then, Multiple Instance Learning scheme is utilized to obtain a representative token for each window. Finally, we compute self-attention among these representative tokens to capture global information. (2) We propose Ghost Feed-forward Network, which compensates for the deficiency of Vision Transformer in capturing local features via a locality mechanism. (3) We use a CNN stem to effectively capture low-level information. Experiments on the PatchCamelyon dataset show that LGViT is better than other state-of-the-art methods.
What problem does this paper attempt to address?