Abstract:Event cameras are bio-inspired sensors that respond to local changes in light intensity and feature low latency, high energy efficiency, and high dynamic range. Meanwhile, Spiking Neural Networks (SNNs) have gained significant attention due to their remarkable efficiency and fault tolerance. By synergistically harnessing the energy efficiency inherent in event cameras and the spike-based processing capabilities of SNNs, their integration could enable ultra-low-power application scenarios, such as action recognition tasks. However, existing approaches often entail converting asynchronous events into conventional frames, leading to additional data mapping efforts and a loss of sparsity, contradicting the design concept of SNNs and event cameras. To address this challenge, we propose SpikePoint, a novel end-to-end point-based SNN architecture. SpikePoint excels at processing sparse event cloud data, effectively extracting both global and local features through a singular-stage structure. Leveraging the surrogate training method, SpikePoint achieves high accuracy with few parameters and maintains low power consumption, specifically employing the identity mapping feature extractor on diverse datasets. SpikePoint achieves state-of-the-art (SOTA) performance on four event-based action recognition datasets using only 16 timesteps, surpassing other SNN methods. Moreover, it also achieves SOTA performance across all methods on three datasets, utilizing approximately 0.3\% of the parameters and 0.5\% of power consumption employed by artificial neural networks (ANNs). These results emphasize the significance of Point Cloud and pave the way for many ultra-low-power event-based data processing applications.

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is how to efficiently use event cameras and Spiking Neural Networks (SNNs) for action recognition tasks. Specifically, the paper aims to address the following challenges: 1. **Maintaining Sparsity and Temporal Information**: Traditional SNN methods usually need to convert asynchronous events into regular frames, which not only increases the workload of data mapping but also leads to the loss of sparsity and temporal information, violating the original design intentions of SNNs and event cameras. 2. **Improving Energy Efficiency and Accuracy**: Existing ANN - based methods perform well in terms of accuracy but have high energy consumption and cannot fully utilize the low - power advantage of event cameras. While existing SNN methods are energy - efficient, their accuracy still needs to be improved. 3. **Directly Processing Event Data**: In order to fully exploit the advantages of event cameras and SNNs, a new technology that can directly process raw event data needs to be developed to avoid additional data conversion steps. To this end, the authors propose **SpikePoint**, a new point - cloud - based spiking neural network architecture specifically designed for action recognition tasks with event cameras. SpikePoint solves the above problems in the following ways: - **Preserving Fine - grained Temporal Features and Sparsity**: SpikePoint directly treats the input as a point cloud rather than stacked event frames, thus preserving the fine - grained temporal features and sparsity of the original events. - **Single - stage Structure**: Different from multi - stage hierarchical structures, SpikePoint adopts a lightweight single - stage structure, which can effectively extract local and global features while reducing the number of parameters and computational complexity. - **Innovative Encoding Method**: In order to handle relative position data containing negative values, SpikePoint introduces a new encoding method to ensure the symmetry and accuracy of information representation. Through these improvements, SpikePoint has achieved state - of - the - art performance on multiple event - camera - based action recognition datasets, while being far inferior to traditional methods in terms of the number of parameters and energy consumption. For example, on the DVS128 Gesture dataset, SpikePoint achieves an accuracy of 98.74% with only 0.58M parameters, and on the Daily DVS dataset, it achieves an accuracy of 97.92% with 0.16M parameters. In addition, SpikePoint also demonstrates excellent adaptability and generalization ability on large - scale datasets such as HMDB51 - DVS and UCF101 - DVS, proving its potential in practical applications.

SpikePoint: An Efficient Point-based Spiking Neural Network for Event Cameras Action Recognition

Event-based Action Recognition Using Motion Information and Spiking Neural Networks

Towards Event Camera Signal Recognition Using a Lightweight Spiking Neural Network

Towards event camera signal recognition with lightweight spiking

Spiking Neural Network as Adaptive Event Stream Slicer

Spike-HAR++: an Energy-Efficient and Lightweight Parallel Spiking Transformer for Event-Based Human Action Recognition

Spiking Neural Network Recognition Method Based on Dynamic Visual Motion Features

ReSpike: Residual Frames-based Hybrid Spiking Neural Networks for Efficient Action Recognition

An Event-based Feature Representation Method for Event Stream Classification Using Deep Spiking Neural Networks

Spiking PointNet: Spiking Neural Networks for Point Clouds

Spiking PointCNN: an Efficient Converted Spiking Neural Network under a Flexible Framework

[A bio-inspired hierarchical spiking neural network with biological synaptic plasticity for event camera object recognition].

Eye Tracking Based on Event Camera and Spiking Neural Network

Event-driven Spiking Neural Networks with Spike-Based Learning

EAS-SNN: End-to-End Adaptive Sampling and Representation for Event-based Detection with Recurrent Spiking Neural Networks

TTPOINT: A Tensorized Point Cloud Network for Lightweight Action Recognition with Event Cameras

Training Robust Spiking Neural Networks with ViewPoint Transform and SpatioTemporal Stretching

Hierarchical Spiking-Based Model for Efficient Image Classification with Enhanced Feature Extraction and Encoding.

Effective AER Object Classification Using Segmented Probability-Maximization Learning in Spiking Neural Networks

CSNN: an Augmented Spiking Based Framework with Perceptron-Inception

Spike Trains Encoding Optimization for Spiking Neural Networks Implementation in FPGA