Hierarchical Feature Fusion of Transformer with Patch Dilating for Remote Sensing Scene Classification
Xiaoning Chen,Mingyang Ma,Yong Li,Shaohui Mei,Zonghao Han,Jian Zhao,Wei Cheng
DOI: https://doi.org/10.1109/tgrs.2023.3331880
IF: 8.2
2023-01-01
IEEE Transactions on Geoscience and Remote Sensing
Abstract:Recently, the Transformer-based technique has emerged as a promising solution for modeling contextual information in remote sensing (RS) scenes and has found widespread applications in RS scene classification. However, how to make full use of intermediate features learned in Transformers is of crucial importance in the RS scene classification tasks. Therefore, this article proposes a hierarchical feature fusion of transformer with patch dilating (HFFT-PD), which aims to capture rich contextual information from hierarchical features to enhance the performance of RS scene classification. Specifically, the HFFT-PD model consists of a hierarchical transformer merging (HTM) block and a lightweight adaptive channel compression (LACC) module, in which the HTM is specially designed for the Transformer architecture to bridge the semantic gaps between features from different hierarchical blocks, and the LACC accounts for the significance of distinct channels in the ultimate classification features. In addition, a brand-new Patch Dilating strategy is uniquely designed for the Transformer paradigm, functioning as a reassembly operator predicated on patch features. Contrasting with conventional upsampling techniques, Patch Dilating facilitates upsampling without requiring supplementary information, while concurrently preserving the semantic content of local spatial structure. Extensive and rigorous experiments conducted on the UC Merced land-use dataset (UCM), aerial image dataset (AID), and NWPU-45 datasets, with training ratios of 80%, 50%, and 20%, respectively, demonstrate that our proposed HFFT-PD outperforms the baseline at least by 0.59%, 0.44%, and 0.99%, respectively, showcasing the significant superiority of our HFFT-PD over contemporary state-of-the-art methodologies.