Gicnet: global information capture network for visual place recognition

Chenyu Wu,Shaoqi Hou,Zebang Qin,Guangqiang Yin,Xinzhong Wang,Zhiguo Wang
DOI: https://doi.org/10.1007/s00530-024-01534-2
IF: 3.9
2024-11-20
Multimedia Systems
Abstract:Visual Place Recognition (VPR) technology aims to use visual information to judge an agent's location, which plays an increasingly crucial role in mobile robot localization and automatic driving, among others. The appearance of outdoor scenes can change dramatically over time due to challenges such as weather, season, and lighting. To obtain robust descriptors that can adapt to complex environmental changes, we propose a Global Information Capture Network (GICNet), which can effectively mine invariant feature expressions under different instances of the same scene. GICNet consists of two carefully designed modules called shuffle channel attention (SCA) and global information aggregator (GIA), which play roles respectively in the processes of "feature extraction" and "feature aggregation" of the model. Specifically, SCA uses shuffle operation to enhance the information interaction between the channels of the feature map and utilizes a self-learning attention mask to recalibrate the feature relationship on the channel dimension. As a novel holistic feature aggregation technique, GIA regards the feature maps of the pre-trained backbone as a group of global features and comprehensively considers the global relationships among the elements in each feature map in a cascaded feature mixture manner. We demonstrate the effectiveness of our technique by conducting extensive experiments on multiple challenging large-scale benchmarks. In particular, the proposed method performs best on the Pitts30k dataset.
computer science, information systems, theory & methods
What problem does this paper attempt to address?