DCT-HistoTransformer: Efficient Lightweight Vision Transformer with DCT Integration for histopathological image analysis

Mahtab Ranjbar,Mehdi Mohebbi,Mahdi Cherakhloo,Bijan Vosoughi. Vahdat
2024-10-25
Abstract:In recent years, the integration of advanced imaging techniques and deep learning methods has significantly advanced computer-aided diagnosis (CAD) systems for breast cancer detection and classification. Transformers, which have shown great promise in computer vision, are now being applied to medical image analysis. However, their application to histopathological images presents challenges due to the need for extensive manual annotations of whole-slide images (WSIs), as these models require large amounts of data to work effectively, which is costly and time-consuming. Furthermore, the quadratic computational cost of Vision Transformers (ViTs) is particularly prohibitive for large, high-resolution histopathological images, especially on edge devices with limited computational resources. In this study, we introduce a novel lightweight breast cancer classification approach using transformers that operates effectively without large datasets. By incorporating parallel processing pathways for Discrete Cosine Transform (DCT) Attention and MobileConv, we convert image data from the spatial domain to the frequency domain to utilize the benefits such as filtering out high frequencies in the image, which reduces computational cost. This demonstrates the potential of our approach to improve breast cancer classification in histopathological images, offering a more efficient solution with reduced reliance on extensive annotated datasets. Our proposed model achieves an accuracy of 96.00% $\pm$ 0.48% for binary classification and 87.85% $\pm$ 0.93% for multiclass classification, which is comparable to state-of-the-art models while significantly reducing computational costs. This demonstrates the potential of our approach to improve breast cancer classification in histopathological images, offering a more efficient solution with reduced reliance on extensive annotated datasets.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The main problems that this paper attempts to solve are: 1. **Reduce the dependence on a large amount of labeled data**: In medical image analysis, especially for histopathological images (such as breast cancer tissue sections), the cost of obtaining large - scale labeled data is high and time - consuming. Existing deep - learning models usually require a large amount of labeled data for training, which is a major challenge in practical applications. 2. **Reduce the computational cost**: Traditional Vision Transformer (ViT) has a very high computational cost when processing high - resolution images due to its quadratic computational complexity ($O(L^2)$, where $L$ is the sequence length), especially when running on edge devices. This limits the application of ViT in actual medical scenarios. To address these challenges, the authors propose a new lightweight model - DCT - HistoTransformer. The model solves the problems through the following two main methods: 1. **Use frequency transformation to reduce the input data size**: By introducing the Discrete Cosine Transform (DCT) to convert spatial data into the frequency domain and remove high - frequency components, the size of the input data is reduced. This not only reduces the computational burden but also retains important low - frequency information, which helps to improve classification accuracy. 2. **Combine local feature capture**: By introducing the MobileConv branch, the model can maintain the ability to capture local features while reducing the size of the input data. This ensures that the model can still maintain high performance while efficiently processing data. Specifically, the DCT - HistoTransformer model contains two parallel processing paths: the DCT - Attention branch and the MobileConv branch. The DCT - Attention branch is responsible for converting the image from the spatial domain to the frequency domain and removing high - frequency components through a low - pass filter, thereby reducing the computational cost. The MobileConv branch is responsible for capturing local features to supplement the deficiency of global features. The experimental results show that the model achieves an accuracy of 96.00% ± 0.48% in binary classification tasks and an accuracy of 87.85% ± 0.93% in multi - classification tasks, while significantly reducing the computational cost. These results demonstrate the potential of this model in breast cancer classification, especially its efficiency and accuracy in processing high - resolution histopathological images.