TransFG: A Cross-View Geo-Localization of Satellite and UAVs Imagery Pipeline Using Transformer-Based Feature Aggregation and Gradient Guidance

Hu Zhao,Keyan Ren,Tianyi Yue,Chun Zhang,Shuai Yuan
DOI: https://doi.org/10.1109/tgrs.2024.3352418
IF: 8.2
2024-01-01
IEEE Transactions on Geoscience and Remote Sensing
Abstract:Cross-view geo-localization of satellite and unmanned aerial vehicles (UAVs) imagery has attracted extensive attention due to its tremendous potential for global navigation satellite system (GNSS) denied navigation. However, inadequate feature representation across different views coupled with positional shifts and distance-scale uncertainty are key challenges. Most of the existing research mainly focused on extracting comprehensive and fine-grained information, yet effective feature representation and alignment should be imposed equal importance. In this article, we propose an innovative transformer-based pipeline TransFG for robust cross-view image matching, which incorporates feature aggregation (FA) and gradient guidance (GG) module. TransFG synergically takes advantage of FA and GG, achieving an effective balance in feature representation and alignment. Specifically, the proposed FA module implicitly learns salient features and dynamically aggregates contextual features from the vision transformer (ViT). The proposed GG module uses the gradient information of local features to further enhance the cross-view feature representation and aligns specific instances across different views. Extensive experiments demonstrate that our pipeline outperforms existing methods in cross-view geo-localization. It achieves an impressive improvement in R@1 and AP than the state-of-the-art (SOTA) methods. The code has been released at https://github.com/happyboy1234/TransFG.
imaging science & photographic technology,remote sensing,engineering, electrical & electronic,geochemistry & geophysics
What problem does this paper attempt to address?