Abstract:Recent advancements in bio-inspired visual sensing and neuromorphic computing have led to the development of various highly efficient bio-inspired solutions with real-world applications. One notable application integrates event-based cameras with spiking neural networks (SNNs) to process event-based sequences that are asynchronous and sparse, making them difficult to handle. In this project, we develop a convolutional spiking neural network (CSNN) architecture that leverages convolutional operations and recurrent properties of a spiking neuron to learn the spatial and temporal relations in the ASL-DVS gesture dataset. The ASL-DVS gesture dataset is a neuromorphic dataset containing hand gestures when displaying 24 letters (A to Y, excluding J and Z due to the nature of their symbols) from the American Sign Language (ASL). We performed classification on a pre-processed subset of the full ASL-DVS dataset to identify letter signs and achieved 100\% training accuracy. Specifically, this was achieved by training in the Google Cloud compute platform while using a learning rate of 0.0005, batch size of 25 (total of 20 batches), 200 iterations, and 10 epochs.

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is to develop an architecture based on Convolutional Spiking Neural Network (CSNN) to process and classify gesture event data from the ASL - DVS dataset. Specifically, the researchers hope to use CSNN to learn and recognize the spatio - temporal relationships in American Sign Language (ASL) gesture data. These data are captured by event - based cameras (DVS) and are asynchronous and sparse, which makes it difficult for traditional processing methods to handle. ### Decomposition of the main problems: 1. **Asynchronous and sparse data processing**: - The data captured by event - based cameras (DVS) is asynchronous and sparse. Unlike traditional frame - based cameras, it only records events of pixel changes. Therefore, how to effectively process this type of data is a challenge. 2. **Spatio - temporal feature extraction**: - The gesture actions included in the ASL - DVS dataset have strong spatio - temporal correlations. How to effectively extract these features through CSNN and use them for classification tasks is another key problem. 3. **High - precision classification**: - The researchers hope to achieve high - precision gesture classification on the ASL - DVS dataset by training the CSNN model. They conducted experiments on the Google Cloud platform and finally achieved a 100% training accuracy rate. ### Overview of solutions: - **CSNN architecture design**: - Using the convolutional operation and the recursive characteristics of spiking neurons, a CSNN architecture that can learn the spatio - temporal relationships in ASL - DVS gesture data is constructed. - **Data pre - processing**: - The ASL - DVS dataset has been pre - processed, including converting the original AEDAT format to CSV format for better data exploration and processing. - **Training and optimization**: - The model was trained on the Google Cloud platform using the Adam optimizer and the Mean Squared Error (MSE) loss function. By adjusting hyper - parameters such as the learning rate and batch size, high training and validation accuracy rates were achieved. ### Formula presentation: In CSNN, the membrane potential change of the Leaky Integrate and Fire (LIF) neuron can be represented by the following formula: \[ \tau \frac{dU(t)}{dt}=-U(t)+R I_{in} \] where: - \( U(t) \) is the membrane potential at time \( t \), - \( R \) is the membrane resistance, - \( I_{in} \) is the input weight matrix, - \( \tau \) is the time constant. The membrane potential update formula is: \[ U(t)=\beta U(t - 1)+(1-\beta) I_{in}(t) \] where \( \beta \) is the decay rate of the membrane potential. The convolutional operation can be represented as: \[ y[i, j]=\sum_{m = -\infty}^{\infty}\sum_{n = -\infty}^{\infty}x(i + m, j + n)\cdot K(m, n) \] where: - \( y[i, j] \) is the convolved feature map, - \( x \) is the input tensor, - \( K \) is the convolution kernel. Through these methods, the researchers successfully solved the problem of gesture recognition in the ASL - DVS dataset and demonstrated the potential of CSNN in processing event - driven data.

Using CSNNs to Perform Event-based Data Processing & Classification on ASL-DVS

RSNN: Recurrent Spiking Neural Networks for Dynamic Spatial-Temporal Information Processing

Event-based Action Recognition Using Motion Information and Spiking Neural Networks

Sign Language Gesture Recognition and Classification Based on Event Camera with Spiking Neural Networks

Event-Based Multimodal Spiking Neural Network with Attention Mechanism

A New Spiking Convolutional Recurrent Neural Network (SCRNN) With Applications to Event-Based Hand Gesture Recognition

A Gesture Recognition Method Based On Spiking Neural Networks For Cognition Development

Deep CovDenseSNN: A Hierarchical Event-Driven Dynamic Framework with Spiking Neurons in Noisy Environment

CSNN: an Augmented Spiking Based Framework with Perceptron-Inception

Hierarchical Spiking-Based Model for Efficient Image Classification with Enhanced Feature Extraction and Encoding.

Digit Recognition using Multimodal Spiking Neural Networks

Spiking Neural Networks for event-based action recognition: A new task to understand their advantage

EvSegSNN: Neuromorphic Semantic Segmentation for Event Data

Rapid Decoding of Hand Gestures in Electrocorticography Using Recurrent Neural Networks.

Efficient Gesture Recognition on Spiking Convolutional Networks Through Sensor Fusion of Event-Based and Depth Data

Scalable Event-by-event Processing of Neuromorphic Sensory Signals With Deep State-Space Models

Learning and real-time classification of hand-written digits with spiking neural networks

A dynamic vision sensor object recognition model based on trainable event-driven convolution and spiking attention mechanism

Live American Sign Language Letter Classification with Convolutional Neural Networks

Towards Low-latency Event-based Visual Recognition with Hybrid Step-wise Distillation Spiking Neural Networks