BEV-CV: Birds-Eye-View Transform for Cross-View Geo-Localisation

Tavis Shore,Simon Hadfield,Oscar Mendez
2024-09-24
Abstract:Cross-view image matching for geo-localisation is a challenging problem due to the significant visual difference between aerial and ground-level viewpoints. The method provides localisation capabilities from geo-referenced images, eliminating the need for external devices or costly equipment. This enhances the capacity of agents to autonomously determine their position, navigate, and operate effectively in GNSS-denied environments. Current research employs a variety of techniques to reduce the domain gap such as applying polar transforms to aerial images or synthesising between perspectives. However, these approaches generally rely on having a 360° field of view, limiting real-world feasibility. We propose BEV-CV, an approach introducing two key novelties with a focus on improving the real-world viability of cross-view geo-localisation. Firstly bringing ground-level images into a semantic Birds-Eye-View before matching embeddings, allowing for direct comparison with aerial image representations. Secondly, we adapt datasets into application realistic format - limited Field-of-View images aligned to vehicle direction. BEV-CV achieves state-of-the-art recall accuracies, improving Top-1 rates of 70° crops of CVUSA and CVACT by 23% and 24% respectively. Also decreasing computational requirements by reducing floating point operations to below previous works, and decreasing embedding dimensionality by 33% - together allowing for faster localisation capabilities.
Computer Vision and Pattern Recognition,Machine Learning
What problem does this paper attempt to address?
This paper attempts to address the problem of matching difficulties caused by significant visual differences between ground-view images and aerial images in Cross-View Geo-Localisation (CVGL). Specifically, existing methods often rely on 360° panoramic images or expensive equipment to reduce the domain gap between views, which poses limitations in practical applications. This paper proposes a new method—BEV-CV (Birds-Eye-View Cross-View), aiming to improve the performance and practical feasibility of CVGL by converting ground-view images into semantic Birds-Eye-View (BEV) maps and matching them under a limited Field-of-View (FOV). ### Main Issues 1. **Viewpoint Differences**: There are significant visual differences between ground-view images and aerial images, leading to matching difficulties. 2. **Limitations of Existing Methods**: Existing methods usually require 360° panoramic images or expensive equipment, limiting their practical application. 3. **Computational Efficiency**: Existing methods demand high computational resources, which is unfavorable for practical applications such as mobile robots. ### Solutions 1. **BEV Conversion**: Convert ground-view images into semantic Birds-Eye-View maps to directly compare with aerial images. 2. **Limited Field-of-View Images**: Use limited Field-of-View images for matching, which is more in line with practical application scenarios. 3. **Multi-Branch Architecture**: Design a multi-branch architecture to extract features from both views and project them into a shared representation space. 4. **Computational Optimization**: Improve computational efficiency by reducing floating-point operations and lowering the embedding dimension. ### Experimental Results - **Recall Rate Improvement**: On the CVUSA and CV ACT datasets, BEV-CV improved the Top-1 recall rate by 23% and 24%, respectively. - **Computational Efficiency**: Reduced the number of floating-point operations and lowered the embedding dimension by 33%, thereby reducing query time and memory requirements. ### Conclusion BEV-CV significantly improves the performance and practical feasibility of cross-view geo-localisation by introducing semantic Birds-Eye-View conversion and limited Field-of-View image matching, while also being more efficient in terms of computational resources.