SpikePoint: An Efficient Point-based Spiking Neural Network for Event Cameras Action Recognition

Hongwei Ren,Yue Zhou,Yulong Huang,Haotian Fu,Xiaopeng Lin,Jie Song,Bojun Cheng
DOI: https://doi.org/10.48550/arXiv.2310.07189
2024-01-23
Abstract:Event cameras are bio-inspired sensors that respond to local changes in light intensity and feature low latency, high energy efficiency, and high dynamic range. Meanwhile, Spiking Neural Networks (SNNs) have gained significant attention due to their remarkable efficiency and fault tolerance. By synergistically harnessing the energy efficiency inherent in event cameras and the spike-based processing capabilities of SNNs, their integration could enable ultra-low-power application scenarios, such as action recognition tasks. However, existing approaches often entail converting asynchronous events into conventional frames, leading to additional data mapping efforts and a loss of sparsity, contradicting the design concept of SNNs and event cameras. To address this challenge, we propose SpikePoint, a novel end-to-end point-based SNN architecture. SpikePoint excels at processing sparse event cloud data, effectively extracting both global and local features through a singular-stage structure. Leveraging the surrogate training method, SpikePoint achieves high accuracy with few parameters and maintains low power consumption, specifically employing the identity mapping feature extractor on diverse datasets. SpikePoint achieves state-of-the-art (SOTA) performance on four event-based action recognition datasets using only 16 timesteps, surpassing other SNN methods. Moreover, it also achieves SOTA performance across all methods on three datasets, utilizing approximately 0.3\% of the parameters and 0.5\% of power consumption employed by artificial neural networks (ANNs). These results emphasize the significance of Point Cloud and pave the way for many ultra-low-power event-based data processing applications.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is how to efficiently use event cameras and Spiking Neural Networks (SNNs) for action recognition tasks. Specifically, the paper aims to address the following challenges: 1. **Maintaining Sparsity and Temporal Information**: Traditional SNN methods usually need to convert asynchronous events into regular frames, which not only increases the workload of data mapping but also leads to the loss of sparsity and temporal information, violating the original design intentions of SNNs and event cameras. 2. **Improving Energy Efficiency and Accuracy**: Existing ANN - based methods perform well in terms of accuracy but have high energy consumption and cannot fully utilize the low - power advantage of event cameras. While existing SNN methods are energy - efficient, their accuracy still needs to be improved. 3. **Directly Processing Event Data**: In order to fully exploit the advantages of event cameras and SNNs, a new technology that can directly process raw event data needs to be developed to avoid additional data conversion steps. To this end, the authors propose **SpikePoint**, a new point - cloud - based spiking neural network architecture specifically designed for action recognition tasks with event cameras. SpikePoint solves the above problems in the following ways: - **Preserving Fine - grained Temporal Features and Sparsity**: SpikePoint directly treats the input as a point cloud rather than stacked event frames, thus preserving the fine - grained temporal features and sparsity of the original events. - **Single - stage Structure**: Different from multi - stage hierarchical structures, SpikePoint adopts a lightweight single - stage structure, which can effectively extract local and global features while reducing the number of parameters and computational complexity. - **Innovative Encoding Method**: In order to handle relative position data containing negative values, SpikePoint introduces a new encoding method to ensure the symmetry and accuracy of information representation. Through these improvements, SpikePoint has achieved state - of - the - art performance on multiple event - camera - based action recognition datasets, while being far inferior to traditional methods in terms of the number of parameters and energy consumption. For example, on the DVS128 Gesture dataset, SpikePoint achieves an accuracy of 98.74% with only 0.58M parameters, and on the Daily DVS dataset, it achieves an accuracy of 97.92% with 0.16M parameters. In addition, SpikePoint also demonstrates excellent adaptability and generalization ability on large - scale datasets such as HMDB51 - DVS and UCF101 - DVS, proving its potential in practical applications.