Abstract:Cross-view Geo-localisation is typically performed at a coarse granularity, because densely sampled satellite image patches overlap heavily. This heavy overlap would make disambiguating patches very challenging. However, by opting for sparsely sampled patches, prior work has placed an artificial upper bound on the localisation accuracy that is possible. Even a perfect oracle system cannot achieve accuracy greater than the average separation of the tiles. To solve this limitation, we propose combining cross-view geo-localisation and relative pose estimation to increase precision to a level practical for real-world application. We develop PEnG, a 2-stage system which first predicts the most likely edges from a city-scale graph representation upon which a query image lies. It then performs relative pose estimation within these edges to determine a precise position. PEnG presents the first technique to utilise both viewpoints available within cross-view geo-localisation datasets to enhance precision to a sub-metre level, with some examples achieving centimetre level accuracy. Our proposed ensemble achieves state-of-the-art precision - with relative Top-5m retrieval improvements on previous works of 213%. Decreasing the median euclidean distance error by 96.90% from the previous best of 734m down to 22.77m, when evaluating with 90 degree horizontal FOV images. Code will be made available: <a class="link-external link-http" href="http://tavisshore.co.uk/PEnG" rel="external noopener nofollow">this http URL</a>

What problem does this paper attempt to address?

### What problem does this paper attempt to solve? This paper aims to solve the problem of limited accuracy in Cross - View Geo - localisation (CVGL). Specifically, CVGL is usually carried out at a coarse - grained level because there is a large amount of overlap between densely - sampled satellite image patches, which makes it very difficult to distinguish these image patches. However, choosing sparsely - sampled image patches can reduce the overlap, but it also artificially limits the upper limit of the positioning accuracy. Even for a perfect system, its accuracy cannot exceed the average interval between image patches. To address this limitation, the author proposes a method that combines cross - view geolocation and relative pose estimation to improve the positioning accuracy to the level required for practical applications. Specifically, the author has developed a two - stage system named PEnG: 1. **First stage**: Predict the candidate edges in the city - scale graph representation where the query image is most likely to be located. 2. **Second stage**: Perform relative pose estimation on these candidate edges to determine the exact location. PEnG is the first technology to utilize the two - view information in the CVGL dataset and can improve the positioning accuracy to the sub - meter level, and in some cases even reach centimeter - level accuracy. Compared with the previous best method, PEnG reduces the median Euclidean distance error from 734 meters to 22.77 meters and improves the Top - 5 - meter retrieval accuracy by 213%. ### Key contributions 1. **High - precision city - scale image positioning**: For the first time, propose to use the two - view information in the CVGL dataset for accurate city - scale image positioning. 2. **Simulate a simple compass**: Filter the reference embeddings according to the configurable yaw - angle threshold, significantly improving the positioning accuracy. 3. **Strong generalization ability**: Performs well in unseen urban areas, for example, achieving a median error of 22.77 meters in the dense area of Manhattan. ### Method overview The PEnG system is divided into two stages: - **First stage**: Graph - based CVGL, predicting the candidate edges in the city graph where the query image is most likely to be located. This stage reduces the number of reference images, making the subsequent pose estimation more efficient. - **Second stage**: Perform relative pose estimation (RPE) along the candidate edges and determine the final 3 - degree - of - freedom pose by combining the likelihoods of the two stages. Through this combined method, PEnG significantly improves the positioning accuracy, especially in city - scale applications. ### Conclusion PEnG successfully combines graph representation, CVGL and relative pose estimation techniques, demonstrating the feasibility of this integration strategy in promoting the practical development of CVGL in large - scale urban environments. The experimental results show that the positioning accuracy of PEnG in the Manhattan area (36.1 square kilometers) has reached an unprecedented level, with the median error reduced from 734 meters to 22.77 meters.

PEnG: Pose-Enhanced Geo-Localisation

Geo-Localization with Transformer-Based 2D-3D Match Network

BEV-CV: Birds-Eye-View Transform for Cross-View Geo-Localisation

Geo-Localization via Ground-to-Satellite Cross-View Image Retrieval

Fine-Grained Cross-View Geo-Localization Using a Correlation-Aware Homography Estimator

ConGeo: Robust Cross-view Geo-localization across Ground View Variations

PetalView: Fine-grained Location and Orientation Extraction of Street-view Images via Cross-view Local Search with Supplementary Materials

Unified and Real-Time Image Geo-Localization via Fine-Grained Overlap Estimation

SpaGBOL: Spatial-Graph-Based Orientated Localisation

GeoDTR+: Toward generic cross-view geolocalization via geometric disentanglement

Accurate 3-DoF Camera Geo-Localization via Ground-to-Satellite Image Matching

Beyond Geo-localization: Fine-grained Orientation of Street-view Images by Cross-view Matching with Satellite Imagery with Supplementary Materials

Cross-view Geo-localization via Learning Disentangled Geometric Layout Correspondence

GeoCLIP: Clip-Inspired Alignment between Locations and Images for Effective Worldwide Geo-localization

Cross-view image geo-localization with Panorama-BEV Co-Retrieval Network

EP2P-Loc: End-to-End 3D Point to 2D Pixel Localization for Large-Scale Visual Localization

CurriculumLoc: Enhancing Cross-Domain Geolocalization through Multi-Stage Refinement

Accurate sensing of scene geo-context via mobile visual localization

Where We Are and What We're Looking At: Query Based Worldwide Image Geo-localization Using Hierarchies and Scenes

CurriculumLoc: Enhancing Cross-Domain Geolocalization Through Multistage Refinement

Mutual Relative Position Learning Transformer for Cross-View Geo-Localization