PEnG: Pose-Enhanced Geo-Localisation

Tavis Shore,Oscar Mendez,Simon Hadfield
2024-11-24
Abstract:Cross-view Geo-localisation is typically performed at a coarse granularity, because densely sampled satellite image patches overlap heavily. This heavy overlap would make disambiguating patches very challenging. However, by opting for sparsely sampled patches, prior work has placed an artificial upper bound on the localisation accuracy that is possible. Even a perfect oracle system cannot achieve accuracy greater than the average separation of the tiles. To solve this limitation, we propose combining cross-view geo-localisation and relative pose estimation to increase precision to a level practical for real-world application. We develop PEnG, a 2-stage system which first predicts the most likely edges from a city-scale graph representation upon which a query image lies. It then performs relative pose estimation within these edges to determine a precise position. PEnG presents the first technique to utilise both viewpoints available within cross-view geo-localisation datasets to enhance precision to a sub-metre level, with some examples achieving centimetre level accuracy. Our proposed ensemble achieves state-of-the-art precision - with relative Top-5m retrieval improvements on previous works of 213%. Decreasing the median euclidean distance error by 96.90% from the previous best of 734m down to 22.77m, when evaluating with 90 degree horizontal FOV images. Code will be made available: <a class="link-external link-http" href="http://tavisshore.co.uk/PEnG" rel="external noopener nofollow">this http URL</a>
Computer Vision and Pattern Recognition,Artificial Intelligence,Robotics
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to solve the problem of limited accuracy in Cross - View Geo - localisation (CVGL). Specifically, CVGL is usually carried out at a coarse - grained level because there is a large amount of overlap between densely - sampled satellite image patches, which makes it very difficult to distinguish these image patches. However, choosing sparsely - sampled image patches can reduce the overlap, but it also artificially limits the upper limit of the positioning accuracy. Even for a perfect system, its accuracy cannot exceed the average interval between image patches. To address this limitation, the author proposes a method that combines cross - view geolocation and relative pose estimation to improve the positioning accuracy to the level required for practical applications. Specifically, the author has developed a two - stage system named PEnG: 1. **First stage**: Predict the candidate edges in the city - scale graph representation where the query image is most likely to be located. 2. **Second stage**: Perform relative pose estimation on these candidate edges to determine the exact location. PEnG is the first technology to utilize the two - view information in the CVGL dataset and can improve the positioning accuracy to the sub - meter level, and in some cases even reach centimeter - level accuracy. Compared with the previous best method, PEnG reduces the median Euclidean distance error from 734 meters to 22.77 meters and improves the Top - 5 - meter retrieval accuracy by 213%. ### Key contributions 1. **High - precision city - scale image positioning**: For the first time, propose to use the two - view information in the CVGL dataset for accurate city - scale image positioning. 2. **Simulate a simple compass**: Filter the reference embeddings according to the configurable yaw - angle threshold, significantly improving the positioning accuracy. 3. **Strong generalization ability**: Performs well in unseen urban areas, for example, achieving a median error of 22.77 meters in the dense area of Manhattan. ### Method overview The PEnG system is divided into two stages: - **First stage**: Graph - based CVGL, predicting the candidate edges in the city graph where the query image is most likely to be located. This stage reduces the number of reference images, making the subsequent pose estimation more efficient. - **Second stage**: Perform relative pose estimation (RPE) along the candidate edges and determine the final 3 - degree - of - freedom pose by combining the likelihoods of the two stages. Through this combined method, PEnG significantly improves the positioning accuracy, especially in city - scale applications. ### Conclusion PEnG successfully combines graph representation, CVGL and relative pose estimation techniques, demonstrating the feasibility of this integration strategy in promoting the practical development of CVGL in large - scale urban environments. The experimental results show that the positioning accuracy of PEnG in the Manhattan area (36.1 square kilometers) has reached an unprecedented level, with the median error reduced from 734 meters to 22.77 meters.