MVC-VPR: Mutual Learning of Viewpoint Classification and Visual Place Recognition

Qiwen Gu,Xufei Wang,Fenglin Zhang,Junqiao Zhao,Siyue Tao,Chen Ye,Tiantian Feng,Changjun Jiang
2024-12-12
Abstract:Visual Place Recognition (VPR) aims to robustly identify locations by leveraging image retrieval based on descriptors encoded from environmental images. However, drastic appearance changes of images captured from different viewpoints at the same location pose incoherent supervision signals for descriptor learning, which severely hinder the performance of VPR. Previous work proposes classifying images based on manually defined rules or ground truth labels for viewpoints, followed by descriptor training based on the classification results. However, not all datasets have ground truth labels of viewpoints and manually defined rules may be suboptimal, leading to degraded descriptor <a class="link-external link-http" href="http://performance.To" rel="external noopener nofollow">this http URL</a> address these challenges, we introduce the mutual learning of viewpoint self-classification and VPR. Starting from coarse classification based on geographical coordinates, we progress to finer classification of viewpoints using simple clustering techniques. The dataset is partitioned in an unsupervised manner while simultaneously training a descriptor extractor for place recognition. Experimental results show that this approach almost perfectly partitions the dataset based on viewpoints, thus achieving mutually reinforcing effects. Our method even excels state-of-the-art (SOTA) methods that partition datasets using ground truth labels.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is how to deal with the problem of inconsistent supervision signals caused by drastic changes in image appearance due to different viewpoints in the Visual Place Recognition (VPR) task. Specifically: 1. **Challenges brought by viewpoint changes**: The appearance of images from different viewpoints at the same location varies significantly, which makes it very difficult to learn a stable and invariant location descriptor, thus affecting the performance of the VPR system. 2. **Limitations of existing methods**: - Some existing VPR methods rely on manually - defined rules or ground - truth labels for viewpoint classification, but not all datasets contain these labels, and the manually - defined rules may not be optimal, resulting in a decline in descriptor performance. - Other methods use contrastive learning or classification cross - entropy loss to learn location representations, but still have deficiencies when dealing with viewpoint changes. To solve these problems, the authors propose a mutual - learning method of viewpoint self - classification and VPR (MVC - VPR). This method starts from coarse classification based on geographical coordinates, gradually performs finer viewpoint classification through simple clustering techniques, divides the dataset in an unsupervised manner, and simultaneously trains a descriptor extractor for location recognition. Experimental results show that this method divides the dataset almost perfectly according to viewpoints, achieving a mutually reinforcing effect, and outperforms methods relying on ground - truth labels on some datasets. ### Specific solutions - **Initial coarse classification**: Initially divide the dataset based on geographical coordinates to form UTM classes. - **Self - classification**: Freeze the network model within each UTM class, extract features and perform K - Means clustering on the images to obtain the viewpoint symbol \(C\). - **Mutual - learning**: Use groups with viewpoint symbols for training and update the current feature extractor within the UTM class after each epoch to achieve more accurate classification. In this way, this method can continuously optimize viewpoint classification during the training process, thereby improving the robustness of the descriptor and the overall performance of the VPR system.