Fourier ViT: A Multi-scale Vision Transformer with Fourier Transform for Histopathological Image Classification

Hufei Duan,Yiqing Liu,Hui Yan,Qiming He,Yonghong He,Tian Guan
DOI: https://doi.org/10.1109/cacre54574.2022.9834158
2022-01-01
Abstract:Histopathology examination is regarded as the gold standard for cancer diagnosis, and accurate classification of medical images is essential. It is worth noting that histopathological images are highly unstructured in relation to natural images. Thus, some key variations will be ignored by directly applying deep learning approaches of classifying natural images to categorize medical images, which leads to inaccuracies. While Fourier transform can extract the unremarkable features in the image. In addition, Vision Transformer (ViT) has proven its strong attention to the relation between the entire and partial images. The synergy between Fourier transform and ViT may greatly benefit the classification of unstructured histopathological images, which is neglected by existing efforts. To realize this vision, this paper proposed Fourier ViT, a universal attention-to-details architecture for fine-grained representation and classification. Furthermore, the paper implemented a mixed attention algorithm to enhance the model’s attention. Extensive experiments indicate the superior performance of our method that surpasses the classic CNN-based models and Transformer-based models.
What problem does this paper attempt to address?