Cross-Attention Network for Cross-View Image Geo-Localization

Jingjing Wang,Xi Li
DOI: https://doi.org/10.1109/isas59543.2023.10164457
2023-01-01
Abstract:The task of cross-view geo-location is to get a corresponding image from a dataset of Global Positioning System (GPS) labeled aerial-view images, given a ground-view query image with an unknown location. This task presents challenges due to the significant differences in viewpoint and appearance between the two types of images. To overcome these challenges, we have developed a novel attention-based method that leverages a key localization cue. The cross-attention-based Swap Encoder Module (SEM) is proposed, which effectively aligns features by directing the network’s focus towards relevant information. Additionally, we employ an Image Proposal Network (IPN) to ensure consistent inputs of both aerial and ground-view images that correspond, during both training and validation phases. Experimental results show that our proposed network significantly outperforms existing benchmarking CVUSA dataset, with significant improvements for top-1 recall from 61.4% to 71.45%, and for top-10 from 90.49% to 92.30%.
What problem does this paper attempt to address?