Abstract:In recent years, the integration of advanced imaging techniques and deep learning methods has significantly advanced computer-aided diagnosis (CAD) systems for breast cancer detection and classification. Transformers, which have shown great promise in computer vision, are now being applied to medical image analysis. However, their application to histopathological images presents challenges due to the need for extensive manual annotations of whole-slide images (WSIs), as these models require large amounts of data to work effectively, which is costly and time-consuming. Furthermore, the quadratic computational cost of Vision Transformers (ViTs) is particularly prohibitive for large, high-resolution histopathological images, especially on edge devices with limited computational resources. In this study, we introduce a novel lightweight breast cancer classification approach using transformers that operates effectively without large datasets. By incorporating parallel processing pathways for Discrete Cosine Transform (DCT) Attention and MobileConv, we convert image data from the spatial domain to the frequency domain to utilize the benefits such as filtering out high frequencies in the image, which reduces computational cost. This demonstrates the potential of our approach to improve breast cancer classification in histopathological images, offering a more efficient solution with reduced reliance on extensive annotated datasets. Our proposed model achieves an accuracy of 96.00% $\pm$ 0.48% for binary classification and 87.85% $\pm$ 0.93% for multiclass classification, which is comparable to state-of-the-art models while significantly reducing computational costs. This demonstrates the potential of our approach to improve breast cancer classification in histopathological images, offering a more efficient solution with reduced reliance on extensive annotated datasets.

B-Cos Aligned Transformers Learn Human-Interpretable Features

B-cos Alignment for Inherently Interpretable CNNs and Vision Transformers

BViT: Broad Attention based Vision Transformer

BiViT: Extremely Compressed Binary Vision Transformers

Swin Transformer: Hierarchical Vision Transformer using Shifted Windows

Pretrained ViTs Yield Versatile Representations For Medical Images

DCT-HistoTransformer: Efficient Lightweight Vision Transformer with DCT Integration for histopathological image analysis

ViT-LSLA: Vision Transformer with Light Self-Limited-Attention

B-cosification: Transforming Deep Neural Networks to be Inherently Interpretable

Vision Transformer for Efficient Chest X-ray and Gastrointestinal Image Classification

DctViT: Discrete Cosine Transform Meet Vision Transformers

Implementing vision transformer for classifying 2D biomedical images

Classification of Mobile-Based Oral Cancer Images Using the Vision Transformer and the Swin Transformer

Improving Vision Transformers by Revisiting High-Frequency Components

An interpretable transformer network for the retinal disease classification using optical coherence tomography

Do Vision Transformers See Like Convolutional Neural Networks?

CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows

Vision Transformer with Sparse Scan Prior