Abstract:Objective The performance of traditional visual place recognition(VPR)algorithms depends on the imaging quality of optical images.However,optical cameras suffer from low temporal resolution and dynamic range.For example,in a scene with high-speed motion,continuously capturing the rapid changes in the position of the scene in the imaging plane is difficult for an optical camera,resulting in motion blur in the output image.When the scene brightness exceeds the recording range of the photosensitive chip of the camera,output image degradation of the optical camera such as under-exposure and overexposure may occur.The blurring,underexposure,and overexposure of optical images will lead to the loss of image texture structure information,which will result in the performance reduction of visual scene recognition algo-rithms.Therefore,the recognition performance of image-based VPR algorithms is poor in high-speed and high dynamic range(HDR)scenarios.Event camera is a new type of visual sensor inspired by biological vision.This camera has the characteristics of low latency and HDR.Using event cameras can effectively improve the recognition performance of VPR algorithms in high-speed and HDR scenes.Therefore,this paper proposes a VPR algorithm fused with event cameras,which utilizes the low latency and HDR characteristics of event cameras to improve the recognition performance of VPR algorithms in extreme scenarios such as high speed and HDR.Method The proposed method first fuses the information of the query image and the events within its exposure time interval to obtain the multimodal features of the query location.The method then retrieves the reference image closest to the multimodal features of the query location in the reference image database.This method also extracts the features of the reference image with good quality using the image feature extraction module and then inputs query image and its events within the exposure time interval to the multimodal to compare the multi-modal query information with the reference image.Multimodal fusion features are obtained by the multimodal feature fusion module,and the reference image most similar to the query image is finally obtained through feature matching retrieval,thereby completing visual scene recognition.The network training is supervised by a triplet loss.The triplet loss drives the network to learn in the direction where the vector distance between the query and positive features is smaller,and the vector distance between the negative feature is larger,until the difference between the negative distance and the positive distance is not less than the similarity distance constant.Therefore,distinguishing reference images with similar and different fields of view from the query image according to the similarity in the feature vector space is possible,further completing the VPR task.Result The experiments are conducted on the MVSEC and RobotCar datasets.The proposed method is compared in experiments with image-based method,event camera-based method,and methods that utilize image and event camera infor-mation.Under different exposure and high-speed scenarios,the proposed method has advantages over existing visual scene recognition algorithms.Specifically,on the MVSEC dataset,the proposed method can reach a maximum recall rate of 99.36％and a maximum recognition accuracy of 96.34％,which improves the recall rate and precision by 5.39％and 8.55％,respectively,compared with the existing VPR methods.On the RobotCar dataset,the proposed method can reach a maximum recall rate of 97.33％and a maximum recognition accuracy of 93.30％,which improves the recall rate and pre-cision by 3.36％and 4.41％,respectively,compared with the existing VPR methods.Experimental results show that in the high-speed and HDR scene,the proposed method has advantages compared with the existing VPR algorithm and demon-strates a remarkable improvement in the recognition performance.Conclusion This paper proposes a VPR algorithm that fuses event cameras,which utilizes the characteristics of low latency and HDR of event cameras and overcomes the problem of image information loss in high-speed and HDR scenes.This method effectively fuses information from image and event modalities,thereby improving the performance of VPR in high-speed and HDR scenarios.

NYC-Event-VPR: A Large-Scale High-Resolution Event-Based Visual Place Recognition Dataset in Dense Urban Environments

Rethinking Human Pose Estimation for Autonomous Driving with 3D Event Representations.

NYC-Indoor-VPR: A Long-Term Indoor Visual Place Recognition Dataset with Semi-Automatic Annotation

Event-VPR: End-to-End Weakly Supervised Deep Network Architecture for Visual Place Recognition using Event-based Vision Sensor

Event-VPR: End-to-End Weakly Supervised Network Architecture for Event-based Visual Place Recognition

Cross-modal Place Recognition in Image Databases using Event-based Sensors

MC-VEO: A Visual-Event Odometry with Accurate 6-Dof Motion Compensation

Event-based visual place recognition with ensembles of temporal windows

Visual Place Recognition with Fusion Event Cameras

PanoVPR: Towards Unified Perspective-to-Equirectangular Visual Place Recognition via Sliding Windows across the Panoramic View

AnyLoc: Towards Universal Visual Place Recognition

DailyDVS-200: A Comprehensive Benchmark Dataset for Event-Based Action Recognition

Exploring Event-based Human Pose Estimation with 3D Event Representations

TUM-VIE: The TUM Stereo Visual-Inertial Event Dataset

A Time-Related Voxel Representation Method for Event Camera

VEFNet: an Event-RGB Cross Modality Fusion Network for Visual Place Recognition.

ES-PTAM: Event-based Stereo Parallel Tracking and Mapping

PL-EVIO: Robust Monocular Event-based Visual Inertial Odometry with Point and Line Features

FE-Fusion-VPR: Attention-based Multi-Scale Network Architecture for Visual Place Recognition by Fusing Frames and Events

VPAIR -- Aerial Visual Place Recognition and Localization in Large-scale Outdoor Environments

VECtor: A Versatile Event-Centric Benchmark for Multi-Sensor SLAM