DecomFormer: Decompose Self-Attention Via Fourier Transform for VHR Aerial Image Scene Classification

Xiyuan Gao,Xinbo Gao,Tao Wang,Yan Zhang,Xiao Pu
DOI: https://doi.org/10.1109/ICASSP49357.2023.10096132
2023-06-04
Abstract:Very high-resolution (VHR) aerial image scene classification is an essential task for aerial image understanding. Although transformer-based models have demonstrated strong ability in natural image classification, transformer-based methods on VHR aerial image tasks are still lack of concern because the complexity of self-attention in the transformer grows quadratically with the image resolution. To address this issue, we decompose the self-attention via Fourier Transform and propose a novel Fourier self-attention (FSA) mechanism. Based on FSA, we design a highly efficient network named DecomFormer, which learns contextual relationships in the real part and imaginary part of the Fourier field, respectively. Theoretically, the DecomFormer reduces the complexity of the naive self-attention mechanism from O(n2) to O(nlog(n)). Universal experiments on public VHR aerial image classification benchmarks demonstrated the DecomFormer’s efficiency, especially on images with very high-resolution.
Computer Science,Engineering,Environmental Science
What problem does this paper attempt to address?