Abstract:Street-to-aerial image geo-localization, which matches a query street-view image to the GPS-tagged aerial images in a reference set, has attracted increasing attention recently. In this paper, we revisit this problem and point out the ignored issue about image alignment information. We show that the performance of a simple Siamese network is highly dependent on the alignment setting and the comparison of previous works can be unfair if they have different assumptions. Instead of focusing on the feature extraction under the alignment assumption, we show that improvements in metric learning techniques significantly boost the performance regardless of the alignment. Without leveraging the alignment information, our pipeline outperforms previous works on both panorama and cropped datasets. Furthermore, we conduct visualization to help understand the learned model and the effect of alignment information using Grad-CAM. With our discovery on the approximate rotation-invariant activation maps, we propose a novel method to estimate the orientation/alignment between a pair of cross-view images with unknown alignment information. It achieves state-of-the-art results on the CVUSA dataset.
What problem does this paper attempt to address?
The problems that this paper attempts to solve mainly focus on several key challenges in cross - view image geo - localization (street - to - aerial view image geo - localization):
1. **Impact of Image Alignment Information**:
- The paper points out that the alignment assumptions between street - view images and aerial - view images in existing works are different, which may lead to unfair performance comparisons. Specifically, different alignment settings can have a significant impact on the retrieval performance of the model (as shown in Table 1). Therefore, the authors hope to improve the generalization ability of the model by eliminating alignment assumptions.
2. **Improving Retrieval Performance without Relying on Alignment Information**:
- The authors propose how to effectively improve retrieval performance without assuming image alignment during the inference stage. To this end, they explore and improve metric learning techniques, especially global mining strategies and new loss functions, to address the unique challenges of cross - view matching.
3. **Estimating Unknown Alignment Information**:
- When there is no explicit supervision, can the alignment information (i.e., direction/angle) between street - view images and aerial - view images be estimated? The authors find that geometric information independent of alignment can be extracted through activation maps, and accordingly propose a new method to estimate the orientation of image pairs.
### Specific Problems and Solutions
- **Impact of Alignment Information**:
- Through an ablation study on a simple Siamese network, the authors demonstrate the great impact of alignment information on retrieval performance. The experimental results show that using randomly rotated aerial - view images during training (i.e., without using alignment information) can make the model perform better on the unaligned validation set, thus improving the model's generalization ability.
- **Improvement of Metric Learning Techniques**:
- The authors introduce the binomial loss function and the global mining strategy to address the problem of imbalance between positive and negative samples in cross - view matching. These improvements enable the model to still achieve better retrieval performance without relying on alignment information.
- **Direction Estimation Method**:
- Based on the activation maps generated by Grad - CAM, the authors observe that the activation maps are approximately rotation - invariant even in the unaligned case. Using this property, they propose a new direction estimation method to estimate the orientation of image pairs by matching the angular distributions of activated pixels in the two views.
### Main Contributions
1. **In - depth Analysis of the Impact of Alignment Information**:
- Through experiments with different alignment settings, the important impact of alignment information on retrieval performance is revealed, providing valuable information for designing more robust and general - purpose frameworks and ensuring fair comparison with previous works.
2. **Improvement of Metric Learning Techniques**:
- The binomial loss function and the global mining strategy are proposed, which significantly improve retrieval performance, especially when the alignment of inference images is not assumed.
3. **Proposing a New Direction Estimation Method**:
- The approximate rotation - invariance of activation maps is discovered, and based on this, a direction estimation method without explicit supervision is proposed, which is significantly superior to existing methods.
In summary, by re - examining the cross - view image geo - localization problem, this paper solves the problem of unfair comparison caused by alignment information and proposes a series of improvement measures, which significantly improve the performance and robustness of the model.