Abstract:Speech recognition (SR) has been improved significantly by artificial neural networks (ANNs), but ANNs have the drawbacks of biologically implausibility and excessive power consumption because of the nonlocal transfer of real-valued errors and weights. While spiking neural networks (SNNs) have the potential to solve these drawbacks of ANNs due to their efficient spike communication and their natural way to utilize kinds of synaptic plasticity rules found in brain for weight modification. However, existing SNN models for SR either had bad performance, or were trained in biologically implausible ways. In this paper, we present a biologically inspired convolutional SNN model for SR. The network adopts the time-to-first-spike coding scheme for fast and efficient information processing. A biological learning rule, spike-timing-dependent plasticity (STDP), is used to adjust the synaptic weights of convolutional neurons to form receptive fields in an unsupervised way. In the convolutional structure, the strategy of local weight sharing is introduced and could lead to better feature extraction of speech signals than global weight sharing. We first evaluated the SNN model with a linear support vector machine (SVM) on the TIDIGITS dataset and it got the performance of 97.5%, comparable to the best results of ANNs. Deep analysis on network outputs showed that, not only are the output data more linearly separable, but they also have fewer dimensions and become sparse. To further confirm the validity of our model, we trained it on a more difficult recognition task based on the TIMIT dataset, and it got a high performance of 93.8%. Moreover, a linear spike-based classifier-tempotron-can also achieve high accuracies very close to that of SVM on both the two tasks. These demonstrate that an STDP-based convolutional SNN model equipped with local weight sharing and temporal coding is capable of solving the SR task accurately and efficiently.

Temporal Information Reconstruction and Non-Aligned Residual in Spiking Neural Networks for Speech Classification

RSNN: Recurrent Spiking Neural Networks for Dynamic Spatial-Temporal Information Processing

Bipolar Population Threshold Encoding for Audio Recognition with Deep Spiking Neural Networks

Attention-Based Deep Spiking Neural Networks for Temporal Credit Assignment Problems.

Unsupervised speech recognition through spike-timing-dependent plasticity in a convolutional spiking neural network

Temporal Coding of Local Spectrogram Features for Robust Sound Recognition

A Biological Population Threshold Coding with Robust Feature Extraction and Neuronal Jitter for SNN-based Speech Recognition.

Spike-based Encoding and Learning of Spectrum Features for Robust Sound Recognition.

Efficient Speech Command Recognition Leveraging Spiking Neural Network and Curriculum Learning-based Knowledge Distillation

A Spiking Neural Network Model for Sound Recognition.

Deep Spiking Neural Network Using Spatio-temporal Backpropagation with Variable Resistance.

Temporal Spiking Generative Adversarial Networks for Heading Direction Decoding

Temporal Reversed Training for Spiking Neural Networks with Generalized Spatio-Temporal Representation

STSC-SNN: Spatio-Temporal Synaptic Connection with temporal convolution and attention for spiking neural networks

A Spiking Neural Network with Distributed Keypoint Encoding for Robust Sound Recognition.

A Hybrid Learning Framework for Deep Spiking Neural Networks with One-Spike Temporal Coding

Single Channel Speech Enhancement Using Temporal Convolutional Recurrent Neural Networks.

An Integrated System For Robust Gender Classification With Convolutional Restricted Boltzmann Machine And Spiking Neural Network

Temporal Pattern Recognition Using Spiking Neural Networks for Cortical Neuronal Spike Train Decodin

Temporal Contrastive Learning for Spiking Neural Networks

Hybrid photonic deep convolutional residual spiking neural networks for text classification