Abstract:Event cameras have been successfully applied to visual place recognition (VPR) tasks by using deep artificial neural networks (ANNs) in recent years. However, previously proposed deep ANN architectures are often unable to harness the abundant temporal information presented in event streams. In contrast, deep spiking networks exhibit more intricate spatiotemporal dynamics and are inherently well-suited to process sparse asynchronous event streams. Unfortunately, directly inputting temporal-dense event volumes into the spiking network introduces excessive time steps, resulting in prohibitively high training costs for large-scale VPR tasks. To address the aforementioned issues, we propose a novel deep spiking network architecture called Spike-EVPR for event-based VPR tasks. First, we introduce two novel event representations tailored for SNN to fully exploit the spatio-temporal information from the event streams, and reduce the video memory occupation during training as much as possible. Then, to exploit the full potential of these two representations, we construct a Bifurcated Spike Residual Encoder (BSR-Encoder) with powerful representational capabilities to better extract the high-level features from the two event representations. Next, we introduce a Shared & Specific Descriptor Extractor (SSD-Extractor). This module is designed to extract features shared between the two representations and features specific to each. Finally, we propose a Cross-Descriptor Aggregation Module (CDA-Module) that fuses the above three features to generate a refined, robust global descriptor of the scene. Our experimental results indicate the superior performance of our Spike-EVPR compared to several existing EVPR pipelines on Brisbane-Event-VPR and DDD20 datasets, with the average Recall@1 increased by 7.61% on Brisbane and 13.20% on DDD20.

Deep Representation Via Convolutional Neural Network for Classification of Spatiotemporal Event Streams

An Event-based Feature Representation Method for Event Stream Classification Using Deep Spiking Neural Networks

Event Stream Learning Using Spatio-Temporal Event Surface

Deep CovDenseSNN: A Hierarchical Event-Driven Dynamic Framework with Spiking Neurons in Noisy Environment

Spiking Neural Network Recognition Method Based on Dynamic Visual Motion Features

Event-based Action Recognition Using Motion Information and Spiking Neural Networks

Event Stream Super-Resolution Via Spatiotemporal Constraint Learning

VMV-GCN: Volumetric Multi-View Based Graph CNN for Event Stream Classification

Multi-scale Harmonic Mean Time Surfaces for Event-based Object Classification

Event Voxel Set Transformer for Spatiotemporal Representation Learning on Event Streams

CIFAR10-DVS: an Event-Stream Dataset for Object Classification

A dynamic vision sensor object recognition model based on trainable event-driven convolution and spiking attention mechanism

Event-Stream Super Resolution using Sigma-Delta Neural Network

[A bio-inspired hierarchical spiking neural network with biological synaptic plasticity for event camera object recognition].

CSNN: an Augmented Spiking Based Framework with Perceptron-Inception

Spatio-Temporal Recurrent Networks for Event-Based Optical Flow Estimation

Hierarchical Spiking-Based Model for Efficient Image Classification with Enhanced Feature Extraction and Encoding.

Representation Learning using Event-based STDP

Event camera object recognition using spatiotemporal event time surface and reward-modulated spike-timing-dependent plasticity learning rule

Spike-EVPR: Deep Spiking Residual Network with Cross-Representation Aggregation for Event-Based Visual Place Recognition

Learning Bottleneck Transformer for Event Image-Voxel Feature Fusion based Classification