A Novel Geo-Localization Method for UAV and Satellite Images Using Cross-View Consistent Attention

Zhuofan Cui,Pengwei Zhou,Xiaolong Wang,Zilun Zhang,Yingxuan Li,Hongbo Li,Yu Zhang
DOI: https://doi.org/10.3390/rs15194667
IF: 5
2023-09-23
Remote Sensing
Abstract:Geo-localization has been widely applied as an important technique to get the longitude and latitude for unmanned aerial vehicle (UAV) navigation in outdoor flight. Due to the possible interference and blocking of GPS signals, the method based on image retrieval, which is less likely to be interfered with, has received extensive attention in recent years. The geo-localization of UAVs and satellites can be achieved by querying pre-obtained satellite images with GPS-tagged and drone images from different perspectives. In this paper, an image transformation technique is used to extract cross-view geo-localization information from UAVs and satellites. A single-stage training method in UAV and satellite geo-localization is first proposed, which simultaneously realizes cross-view feature extraction and image retrieval, and achieves higher accuracy than existing multi-stage training techniques. A novel piecewise soft-margin triplet loss function is designed to avoid model parameters being trapped in suboptimal sets caused by the lack of constraint on positive and negative samples. The results illustrate that the proposed loss function enhances image retrieval accuracy and realizes a better convergence. Moreover, a data augmentation method for satellite images is proposed to overcome the disproportionate numbers of image samples. On the benchmark University-1652, the proposed method achieves the state-of-the-art result with a 6.67% improvement in recall rate (R@1) and 6.13% in average precision (AP). All codes will be publicized to promote reproducibility.
environmental sciences,imaging science & photographic technology,remote sensing,geosciences, multidisciplinary
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to address the problem of geolocation using Unmanned Aerial Vehicle (UAV) and satellite images. Specifically, it attempts to improve the accuracy and efficiency of image retrieval through a novel cross-view consistent attention method. #### Main Issues: 1. **GPS Signal Interference**: During outdoor flights, UAV navigation relies on GPS signals, but these signals can be interfered with or obstructed. 2. **Image Feature Extraction**: Traditional Convolutional Neural Network (CNN)-based methods struggle to handle visual disturbances in cross-view images (such as buildings with similar colors or shapes), leading to decreased image retrieval accuracy. 3. **Training Complexity**: Existing cross-view image retrieval methods often require multi-stage training, increasing the complexity and time consumption of model training. 4. **Data Imbalance**: The imbalance in the number of UAV images and satellite images causes the model to focus excessively on certain categories during training, making it difficult to learn the distribution of minority category images. #### Solutions: 1. **Single-Stage Training Method**: A single-stage training method is proposed, utilizing the Vision Transformer architecture to simultaneously achieve cross-view feature extraction and image retrieval, improving accuracy and reducing training complexity. 2. **Segmented Soft Margin Triplet Loss Function**: A new segmented soft margin triplet loss function is designed to avoid the problem of model parameters falling into suboptimal sets, enhancing the accuracy and convergence of image retrieval. 3. **Color Transfer Method**: To reduce the interference caused by color inconsistency between UAV images and satellite images, a color transfer technique is introduced, making the color distribution of UAV images consistent with satellite images. 4. **Data Augmentation Strategy**: A data augmentation method for satellite images is proposed to address the data imbalance issue, enhancing model performance. #### Main Contributions: 1. A single-stage training cross-view geolocation image retrieval method is proposed, achieving the highest retrieval accuracy while reducing model training complexity and time consumption. 2. A new segmented loss function is designed to overcome the issue of a small proportion of positive samples during training. Combined with the color transfer module and data augmentation techniques, it effectively improves geolocation accuracy. 3. Achieved state-of-the-art results on the University-1652 benchmark and released all codes to promote reproducibility.