Gradient-Guided Multi-Scale Focal Attention Network for Remote Sensing Scene Classification

Yue Zhao,Maoguo Gong,A. K. Qin,Mingyang Zhang,Zhuping Hu,Tianqi Gao,Yan Pu
DOI: https://doi.org/10.1109/tgrs.2024.3424489
IF: 8.2
2024-01-01
IEEE Transactions on Geoscience and Remote Sensing
Abstract:Remote sensing scene classification (RSSC) aims to understand and analyze the semantic information at the scene level with complex geographical properties. Despite the profound success of advanced deep models in automatically capturing hierarchical embedding representations and the gradual dominant trend in RSSC, it still remains a great challenge to precisely focus on targets at variable scales that are considered highly relevant to the corresponding scene and separated from the background. Motivated by this recognition, in this article, we present the gradient-guided multiscale focal attention network (GMFANet) for RSSC to adaptively localize the representative multiscale semantic representation for complex scenes. In particular, a lightweight parameterized hierarchical multiscale attention (HMA) mechanism is proposed, which constitutes the main aim of adaptively enhancing physical detail and high-level semantic information at different layers, rather than regarding each scale set with equivalent insight, while eliminating redundant information inherent in conventional attention mechanisms. Subsequently, a gradient-guided spatial focused attention (GSFA) module is specifically designed to accurately localize critical regions at multiple scales, with the dynamic combination of gradient-activated reference attention map and prediction attention map from supervised information-based learning. In addition, a curriculum-driven dynamic attention fusion (CDAF) strategy is tailored to fuse the spatial attention above from easy to hard for avoiding from poor local optimum and decreasing the early learning ambiguity. Our extensive comparative experiments and ablation analyses implemented on real-world public RSSC datasets indicate that our approach achieves the state-of-the-art performance exactly. The code is available at https://github.com/bling2beyond/GMFANet.
What problem does this paper attempt to address?