MeshVPR: Citywide Visual Place Recognition Using 3D Meshes

Gabriele Berton,Lorenz Junglas,Riccardo Zaccone,Thomas Pollok,Barbara Caputo,Carlo Masone
2024-07-24
Abstract:Mesh-based scene representation offers a promising direction for simplifying large-scale hierarchical visual localization pipelines, combining a visual place recognition step based on global features (retrieval) and a visual localization step based on local features. While existing work demonstrates the viability of meshes for visual localization, the impact of using synthetic databases rendered from them in visual place recognition remains largely unexplored. In this work we investigate using dense 3D textured meshes for large-scale Visual Place Recognition (VPR). We identify a significant performance drop when using synthetic mesh-based image databases compared to real-world images for retrieval. To address this, we propose MeshVPR, a novel VPR pipeline that utilizes a lightweight features alignment framework to bridge the gap between real-world and synthetic domains. MeshVPR leverages pre-trained VPR models and is efficient and scalable for city-wide deployments. We introduce novel datasets with freely available 3D meshes and manually collected queries from Berlin, Paris, and Melbourne. Extensive evaluations demonstrate that MeshVPR achieves competitive performance with standard VPR pipelines, paving the way for mesh-based localization systems. Data, code, and interactive visualizations are available at <a class="link-external link-https" href="https://meshvpr.github.io/" rel="external noopener nofollow">this https URL</a>
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to solve **the performance degradation problem encountered when using 3D mesh models in large - scale urban - wide visual place recognition (VPR)**. Specifically, the author explores how to use dense 3D textured mesh models for large - scale VPR and proposes a new VPR pipeline - MeshVPR to bridge the domain gap between real and synthetic images. #### Main problems: 1. **The domain gap between synthetic and real images**: When using a synthetic image database generated from 3D meshes for VPR, the performance drops significantly. This is because there are visual differences between synthetic and real - world images, resulting in inconsistent feature representations. 2. **Limitations of existing methods**: Existing mesh - based visual localization work is either limited to small scenes, skipping the VPR step; or relies on a real - image database for retrieval and then performs mesh - based post - processing; or retrieves on a limited - scale map without reporting significant performance degradation. #### Solutions: To address the above challenges, the author proposes MeshVPR, a new - type VPR pipeline, with the following main features: - **Efficient matching**: It can efficiently match real - world photos with a synthetic image database. - **Scalability**: It is suitable for large - scale datasets and supports city - level deployment. - **Strong results**: Through a lightweight feature - alignment framework, the performance is close to that of the standard real - world VPR pipeline. #### Method overview: 1. **Feature alignment**: By fine - tuning a pre - trained VPR model, ensure that real and synthetic images taken from the same viewpoint are aligned in the feature space. 2. **Database generation**: Generate a synthetic image database from 3D meshes to simulate real street - view images. 3. **Inference stage**: Use a specialized model to extract features of the synthetic database and use a pre - trained model to process real - world query images. Through these methods, MeshVPR can achieve efficient visual place recognition in large - scale urban environments, paving the way for future mesh - based localization systems. #### Experimental verification: The author conducted extensive experiments on several newly constructed datasets (such as Berlin, Paris, and Melbourne), demonstrating the effectiveness and competitiveness of MeshVPR. The experimental results show that even on a synthetic image database, MeshVPR can achieve performance comparable to that on a real - image database. ### Summary: The main contribution of this paper lies in filling the research gap of 3D meshes for urban - wide VPR, quantifying the performance gap between using real and synthetic images, and proposing an effective solution to bridge this gap. This provides an important basis for the future development of mesh - based visual localization systems.