Abstract:Event-based cameras are inspired by the sparse and asynchronous spike representation of the biological visual system. However, processing the event data requires either using expensive feature descriptors to transform spikes into frames, or using spiking neural networks that are expensive to train. In this work, we propose a neural network architecture, Reservoir Nodes-enabled neuromorphic vision sensing Network (RN-Net), based on simple convolution layers integrated with dynamic temporal encoding reservoirs for local and global spatiotemporal feature detection with low hardware and training costs. The RN-Net allows efficient processing of asynchronous temporal features, and achieves the highest accuracy of 99.2% for DVS128 Gesture reported to date, and one of the highest accuracy of 67.5% for DVS Lip dataset at a much smaller network size. By leveraging the internal device and circuit dynamics, asynchronous temporal feature encoding can be implemented at very low hardware cost without preprocessing and dedicated memory and arithmetic units. The use of simple DNN blocks and standard backpropagation-based training rules further reduces implementation costs.

What problem does this paper attempt to address?

The paper aims to address the data processing issues generated by event-based cameras. Specifically: 1. **Background and Challenges**: - Event cameras are inspired by their sparse and asynchronous spike representation, mimicking biological vision systems. However, processing these event data usually requires expensive feature descriptors to convert spikes into frames or using costly-to-train spiking neural networks (SNNs). 2. **Proposed Method**: - The paper proposes a new neural network architecture—RN-Net (Reservoir Nodes-enabled neuromorphic vision sensing Network), which utilizes dynamic time encoding on the sensor and simple deep neural network (DNN) blocks to process spatiotemporal features. - By combining two modules: reservoir nodes for local time feature encoding (Rin) and reservoir nodes for global time feature encoding (Rf), RN-Net can efficiently handle asynchronous event data and achieve spatiotemporal feature extraction without increasing complexity. 3. **Main Contributions**: - A neural network architecture combining multiple reservoir layers and DNN blocks is proposed to process the temporal and spatial information in the asynchronous event streams generated by event cameras. - Sensor-based reservoir nodes (RNs) based on short-term memory (STM) memristors provide richer spike encoding, reduce costs, and simplify the DNN modules, thereby improving operational and training efficiency. - RN-Net achieved the highest accuracy to date on tasks such as CIFAR10-DVS, N-Caltech 101, DVS128 Gesture, and N-CARS, with a network size an order of magnitude smaller than other networks of similar capacity. In summary, the paper aims to address the efficient processing of event-based vision sensing data by introducing RN-Net, demonstrating significant advantages, especially in spatiotemporal feature extraction.

RN-Net: Reservoir Nodes-Enabled Neuromorphic Vision Sensing Network

RSNN: Recurrent Spiking Neural Networks for Dynamic Spatial-Temporal Information Processing

NeuSpike-Net: High Speed Video Reconstruction via Bio-inspired Neuromorphic Cameras

Hierarchical Spiking-Based Model for Efficient Image Classification with Enhanced Feature Extraction and Encoding.

EDeNN: Event Decay Neural Networks for low latency vision

Deep CovDenseSNN: A Hierarchical Event-Driven Dynamic Framework with Spiking Neurons in Noisy Environment

Event-based Video Reconstruction via Potential-assisted Spiking Neural Network

Spiking Neural Network Recognition Method Based on Dynamic Visual Motion Features

NeuroZoom: Denoising and Super Resolving Neuromorphic Events and Spikes.

Sparse Temporal Encoding of Visual Features for Robust Object Recognition by Spiking Neurons

ASIE: An Asynchronous SNN Inference Engine for AER Events Processing

Sparser spiking activity can be better: Feature Refine-and-Mask spiking neural network for event-based visual recognition

Embodied Neuromorphic Vision with Event-Driven Random Backpropagation

Computational event-driven vision sensors for in-sensor spiking neural networks

A Cost-Efficient High-Speed VLSI Architecture for Spiking Convolutional Neural Network Inference Using Time-Step Binary Spike Maps

VTSNN: a virtual temporal spiking neural network

Razor SNN: Efficient Spiking Neural Network with Temporal Embeddings

Ultra-low Latency Spiking Neural Networks with Spatio-Temporal Compression and Synaptic Convolutional Block

A 1024-Neuron 1M-Synapse Event-Driven SNN Accelerator for DVS Applications

An Artificial Visual Neuron with Multiplexed Rate and Time-to-first-spike Coding