Abstract:Cross-View Geo-Localisation is still a challenging task where additional modules, specific pre-processing or zooming strategies are necessary to determine accurate positions of images. Since different views have different geometries, pre-processing like polar transformation helps to merge them. However, this results in distorted images which then have to be rectified. Adding hard negatives to the training batch could improve the overall performance but with the default loss functions in geo-localisation it is difficult to include them. In this article, we present a simplified but effective architecture based on contrastive learning with symmetric InfoNCE loss that outperforms current state-of-the-art results. Our framework consists of a narrow training pipeline that eliminates the need of using aggregation modules, avoids further pre-processing steps and even increases the generalisation capability of the model to unknown regions. We introduce two types of sampling strategies for hard negatives. The first explicitly exploits geographically neighboring locations to provide a good starting point. The second leverages the visual similarity between the image embeddings in order to mine hard negative samples. Our work shows excellent performance on common cross-view datasets like CVUSA, CVACT, University-1652 and VIGOR. A comparison between cross-area and same-area settings demonstrate the good generalisation capability of our model.

What problem does this paper attempt to address?

The paper attempts to address the problem of improving model performance in the Cross-View Geo-Localisation task. Specifically, the paper focuses on the following aspects: 1. **Reducing Preprocessing Steps**: Traditional cross-view geo-localisation methods often require additional modules, specific preprocessing steps, or scaling strategies to determine the precise location of images. These steps often lead to image distortion, requiring further correction. 2. **Introducing Hard Negative Samples**: In geo-localisation tasks, using default loss functions (such as triplet loss) makes it difficult to include hard negative samples (i.e., negative samples that are difficult to distinguish from positive samples). Introducing hard negative samples can significantly improve the overall performance of the model. 3. **Improving Generalization Ability**: Existing methods perform poorly when dealing with unknown areas. The proposed method aims to improve the model's generalization ability to unknown areas. To achieve the above goals, the paper proposes the following solutions: - **Simplified but Effective Architecture**: A contrastive learning-based framework using symmetric InfoNCE loss, which can eliminate the use of aggregation modules in a simple training pipeline, avoid additional preprocessing steps, and improve the model's generalization ability to unknown areas. - **Two Hard Negative Sampling Strategies**: - **Geographical Distance-Based Sampling**: In the early stages of training, GPS coordinates are used to select geographically adjacent locations as initial sampling points. - **Dynamic Sampling Based on Visual Similarity**: In the later stages of training, cosine similarity between street view and satellite image embeddings is used to mine hard negative samples. Through these methods, the paper demonstrates excellent performance on common cross-view datasets (such as CVUSA, CVACT, University-1652, and VIGOR), and comparisons under different regional settings show the model's good generalization ability.

Sample4Geo: Hard Negative Sampling For Cross-View Geo-Localisation

ConGeo: Robust Cross-view Geo-localization across Ground View Variations

GeoDTR+: Toward generic cross-view geolocalization via geometric disentanglement

BEV-CV: Birds-Eye-View Transform for Cross-View Geo-Localisation

Graph sampling based deep metric learning for cross-view geo-localization

Learning Cross-View Visual Geo-Localization Without Ground Truth

A Novel Geo-Localization Method for UAV and Satellite Images Using Cross-View Consistent Attention

Cross-view Geo-localization via Learning Disentangled Geometric Layout Correspondence

Orientation-Guided Contrastive Learning for UAV-View Geo-Localisation

Each Part Matters: Local Patterns Facilitate Cross-View Geo-Localization

Unleashing Unlabeled Data: A Paradigm for Cross-View Geo-Localization

Fine-Grained Cross-View Geo-Localization Using a Correlation-Aware Homography Estimator

Cross-view image geo-localization with Panorama-BEV Co-Retrieval Network

Beyond Geo-localization: Fine-grained Orientation of Street-view Images by Cross-view Matching with Satellite Imagery with Supplementary Materials

Adapting Fine-Grained Cross-View Localization to Areas without Fine Ground Truth

UAV-Satellite View Synthesis for Cross-view Geo-Localization

IML-Net: A Framework for Cross-View Geo-Localization with Multi-Domain Remote Sensing Data

Atten-ganCV: an End-to-End Close-Coupled Image-Generating Cross-View Network

Image-Based Geo-Localization Using Satellite Imagery

Joint Representation Learning and Keypoint Detection for Cross-View Geo-Localization