Bridging CNN and Transformer With Cross-Attention Fusion Network for Hyperspectral Image Classification

Fulin Xu,Shaohui Mei,Ge Zhang,Nan Wang,Qian Du
DOI: https://doi.org/10.1109/tgrs.2024.3419266
IF: 8.2
2024-07-13
IEEE Transactions on Geoscience and Remote Sensing
Abstract:Feature representation is crucial for hyperspectral image (HSI) classification. However, existing convolutional neural network (CNN)-based methods are limited by the convolution kernel and only focus on local features, which causes it to ignore the global properties of HSIs. Transformer-based networks can make up for the limitations of CNNs because they emphasize the global features of HSIs. How to combine the advantages of these two networks in feature extraction is of great importance in improving classification accuracy. Therefore, a cross-attention fusion network bridging CNN and Transformer (CAF-Former) is proposed, which can fully utilize the advantages of CNN in local features and Transformer's long time-dependent feature learning for hyperspectral classification. In order to fully explore the local and global information within an HSI, a Dynamic-CNN branch is proposed to effectively encode local features of pixels, while a Gaussian Transformer branch is constructed to accurately model the global features and long-range dependencies. Moreover, in order to fully interact with local and global features, a cross-attention fusion (CAF) module is proposed as a bridge to fuse the features extracted by the two branches. Experiments over several benchmark datasets demonstrate that the proposed CAF-Former significantly outperforms both CNN-based and Transformer-based state-of-the-art networks for HSI classification.
engineering, electrical & electronic,imaging science & photographic technology,remote sensing,geochemistry & geophysics
What problem does this paper attempt to address?