Adaptive Global Embedding Learning: A Two-Stage Framework for UAV-View Geo-Localization

Cuiwei Liu,Shishen Li,Chong Du,Huaijun Qiu
DOI: https://doi.org/10.1109/lsp.2024.3392676
2024-05-07
IEEE Signal Processing Letters
Abstract:This letter aims to deal with the UAV-view geo-localization problem, which is essentially to achieve bi-directional cross-view matching between UAV-view and satellite-view images. The existing studies have confirmed the importance of learning part-wise representations for this task. We go a step further by proposing a two-stage learning framework. The first stage focuses on extracting part-wise representations. In the second stage, a novel Adaptive Embedding Network (AEN) integrates these representations into a global embedding of the entire image to avoid an equal influence of all local parts on image similarity measures. Current mainstream methods typically employ Cross-Entropy loss to learn location-dependent representations, aiming to push the distance between different locations in the learned representation space. Some approaches also utilize KL loss or Triplet loss to bring a pair of UAV-satellite images from the same location closer for learning view-invariant representations. However, they overlook a critical concern: a notable representation bias exists among UA-view images captured from the same location but at different viewpoints or heights. To address these issues, we devise a novel cross-view matching loss that narrows the distance between the global embeddings of a satellite-view image and the affinity-aware prototype of multiple true-matched UAV-view images. The experimental results on the University-1652 dataset indicate that similarity measures in the learned embedding space exhibit excellent generalization to images from new locations, achieving superior cross-view matching performance compared to previous methods.
engineering, electrical & electronic
What problem does this paper attempt to address?