Abstract:In recent years, the integration of advanced imaging techniques and deep learning methods has significantly advanced computer-aided diagnosis (CAD) systems for breast cancer detection and classification. Transformers, which have shown great promise in computer vision, are now being applied to medical image analysis. However, their application to histopathological images presents challenges due to the need for extensive manual annotations of whole-slide images (WSIs), as these models require large amounts of data to work effectively, which is costly and time-consuming. Furthermore, the quadratic computational cost of Vision Transformers (ViTs) is particularly prohibitive for large, high-resolution histopathological images, especially on edge devices with limited computational resources. In this study, we introduce a novel lightweight breast cancer classification approach using transformers that operates effectively without large datasets. By incorporating parallel processing pathways for Discrete Cosine Transform (DCT) Attention and MobileConv, we convert image data from the spatial domain to the frequency domain to utilize the benefits such as filtering out high frequencies in the image, which reduces computational cost. This demonstrates the potential of our approach to improve breast cancer classification in histopathological images, offering a more efficient solution with reduced reliance on extensive annotated datasets. Our proposed model achieves an accuracy of 96.00% $\pm$ 0.48% for binary classification and 87.85% $\pm$ 0.93% for multiclass classification, which is comparable to state-of-the-art models while significantly reducing computational costs. This demonstrates the potential of our approach to improve breast cancer classification in histopathological images, offering a more efficient solution with reduced reliance on extensive annotated datasets.

What problem does this paper attempt to address?

The main problems that this paper attempts to solve are: 1. **Reduce the dependence on a large amount of labeled data**: In medical image analysis, especially for histopathological images (such as breast cancer tissue sections), the cost of obtaining large - scale labeled data is high and time - consuming. Existing deep - learning models usually require a large amount of labeled data for training, which is a major challenge in practical applications. 2. **Reduce the computational cost**: Traditional Vision Transformer (ViT) has a very high computational cost when processing high - resolution images due to its quadratic computational complexity ($O(L^2)$, where $L$ is the sequence length), especially when running on edge devices. This limits the application of ViT in actual medical scenarios. To address these challenges, the authors propose a new lightweight model - DCT - HistoTransformer. The model solves the problems through the following two main methods: 1. **Use frequency transformation to reduce the input data size**: By introducing the Discrete Cosine Transform (DCT) to convert spatial data into the frequency domain and remove high - frequency components, the size of the input data is reduced. This not only reduces the computational burden but also retains important low - frequency information, which helps to improve classification accuracy. 2. **Combine local feature capture**: By introducing the MobileConv branch, the model can maintain the ability to capture local features while reducing the size of the input data. This ensures that the model can still maintain high performance while efficiently processing data. Specifically, the DCT - HistoTransformer model contains two parallel processing paths: the DCT - Attention branch and the MobileConv branch. The DCT - Attention branch is responsible for converting the image from the spatial domain to the frequency domain and removing high - frequency components through a low - pass filter, thereby reducing the computational cost. The MobileConv branch is responsible for capturing local features to supplement the deficiency of global features. The experimental results show that the model achieves an accuracy of 96.00% ± 0.48% in binary classification tasks and an accuracy of 87.85% ± 0.93% in multi - classification tasks, while significantly reducing the computational cost. These results demonstrate the potential of this model in breast cancer classification, especially its efficiency and accuracy in processing high - resolution histopathological images.

DCT-HistoTransformer: Efficient Lightweight Vision Transformer with DCT Integration for histopathological image analysis

Vision Transformers for Computational Histopathology

Vision transformer-convolution for breast cancer classification using mammography images: A comparative study

Vision Transformers for Small Histological Datasets Learned through Knowledge Distillation

Supervised Contrastive Vision Transformer for Breast Histopathological Image Classification

Hierarchical Vision Transformers for Context-Aware Prostate Cancer Grading in Whole Slide Images

Detection of breast cancer in digital breast tomosynthesis with vision transformers

Pathological Insights: Enhanced Vision Transformers for the Early Detection of Colorectal Cancer

Vision Transformer for Efficient Chest X-ray and Gastrointestinal Image Classification

Implementing vision transformer for classifying 2D biomedical images

High-Performance Classification of Breast Cancer Histopathological Images Using Fine-Tuned Vision Transformers on the BreakHis Dataset

MC-ViT: Multi-path cross-scale vision transformer for thymoma histopathology whole slide image typing

Vision Transformer for Classification of Breast Ultrasound Images

Masked pre-training of transformers for histology image analysis

Benchmarking Image Transformers for Prostate Cancer Detection from Ultrasound Data

Skin Cancer Segmentation and Classification Using Vision Transformer for Automatic Analysis in Dermatoscopy-Based Noninvasive Digital System

Kernel Attention Transformer for Histopathology Whole Slide Image Analysis and Assistant Cancer Diagnosis

RDTNet: A residual deformable attention based transformer network for breast cancer classification

Fourier ViT: A Multi-scale Vision Transformer with Fourier Transform for Histopathological Image Classification

Advances in Medical Image Analysis with Vision Transformers: A Comprehensive Review

SMiT: Symmetric Mask Transformer for Disease Severity Detection.