VLAD-BuFF: Burst-aware Fast Feature Aggregation for Visual Place Recognition

Ahmad Khaliq,Ming Xu,Stephen Hausler,Michael Milford,Sourav Garg

2024-09-28

Abstract:Visual Place Recognition (VPR) is a crucial component of many visual localization pipelines for embodied agents. VPR is often formulated as an image retrieval task aimed at jointly learning local features and an aggregation method. The current state-of-the-art VPR methods rely on VLAD aggregation, which can be trained to learn a weighted contribution of features through their soft assignment to cluster centers. However, this process has two key limitations. Firstly, the feature-to-cluster weighting does not account for over-represented repetitive structures within a cluster, e.g., shadows or window panes; this phenomenon is also referred to as the `burstiness' problem, classically solved by discounting repetitive features before aggregation. Secondly, feature to cluster comparisons are compute-intensive for state-of-the-art image encoders with high-dimensional local features. This paper addresses these limitations by introducing VLAD-BuFF with two novel contributions: i) a self-similarity based feature discounting mechanism to learn Burst-aware features within end-to-end VPR training, and ii) Fast Feature aggregation by reducing local feature dimensions specifically through PCA-initialized learnable pre-projection. We benchmark our method on 9 public datasets, where VLAD-BuFF sets a new state of the art. Our method is able to maintain its high recall even for 12x reduced local feature dimensions, thus enabling fast feature aggregation without compromising on recall. Through additional qualitative studies, we show how our proposed weighting method effectively downweights the non-distinctive features. Source code: <a class="link-external link-https" href="https://github.com/Ahmedest61/VLAD-BuFF/" rel="external noopener nofollow">this https URL</a>.

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

The paper aims to address two key issues in the field of Visual Place Recognition (VPR): 1. **Burstiness Problem**: Current methods based on VLAD (Vector of Locally Aggregated Descriptors) do not consider the over-representation of repetitive features (such as shadows, windows, etc.) during the aggregation process, which can lead to the relative importance of other more significant features (such as signs or specific building features) being underestimated. The paper proposes a weighting mechanism based on feature self-similarity to learn and adjust the weights of these bursty features during end-to-end training. 2. **Computational Efficiency Problem**: Traditional VLAD methods require a significant amount of computational resources for their clustering and aggregation process, especially when using high-dimensional features. To address this issue, the paper introduces a pre-aggregation PCA projection layer to reduce the computational load through dimensionality reduction while maintaining a high recall rate. In summary, the core contribution of the paper is the proposal of the VLAD-BuFF method, which addresses the shortcomings of existing VPR methods in handling bursty features and improves computational efficiency by introducing feature self-similarity weighting and pre-aggregation dimensionality reduction techniques.

VLAD-BuFF: Burst-aware Fast Feature Aggregation for Visual Place Recognition

BEV^2PR: BEV-Enhanced Visual Place Recognition with Structural Cues

MultiRes-NetVLAD: Augmenting Place Recognition Training with Low-Resolution Imagery

Optimal Transport Aggregation for Visual Place Recognition

MixVPR: Feature Mixing for Visual Place Recognition

Register assisted aggregation for Visual Place Recognition

Voxelized 3D Feature Aggregation for Multiview Detection

ClusVPR: Efficient Visual Place Recognition with Clustering-based Weighted Transformer

Ghost-dil-NetVLAD: A Lightweight Neural Network for Visual Place Recognition

DMPCANet: A Low Dimensional Aggregation Network for Visual Place Recognition

Image Representation Optimization Based on Locally Aggregated Descriptors.

Salient-VPR: Salient Weighted Global Descriptor for Visual Place Recognition

PointNetVLAD: Deep Point Cloud Based Retrieval for Large-Scale Place Recognition

Contextual Patch-NetVLAD: Context-Aware Patch Feature Descriptor and Patch Matching Mechanism for Visual Place Recognition

MS-NetVLAD: Multi-Scale NetVLAD for Visual Place Recognition

STA-VPR: Spatio-temporal Alignment for Visual Place Recognition

StructVPR: Distill Structural Knowledge with Weighting Samples for Visual Place Recognition.

VOLoc: Visual Place Recognition by Querying Compressed Lidar Map

LVLM-empowered Multi-modal Representation Learning for Visual Place Recognition

CricaVPR: Cross-image Correlation-aware Representation Learning for Visual Place Recognition

NetVLAD: CNN Architecture for Weakly Supervised Place Recognition