You Only Spike Once: Improving Energy-Efficient Neuromorphic Inference to ANN-Level Accuracy

Srivatsa P,Kyle Timothy Ng Chu,Burin Amornpaisannon,Yaswanth Tavva,Venkata Pavan Kumar Miriyala,Jibin Wu,Malu Zhang,Haizhou Li,Trevor E. Carlson
DOI: https://doi.org/10.48550/arXiv.2006.09982
2020-11-08
Abstract:In the past decade, advances in Artificial Neural Networks (ANNs) have allowed them to perform extremely well for a wide range of tasks. In fact, they have reached human parity when performing image recognition, for example. Unfortunately, the accuracy of these ANNs comes at the expense of a large number of cache and/or memory accesses and compute operations. Spiking Neural Networks (SNNs), a type of neuromorphic, or brain-inspired network, have recently gained significant interest as power-efficient alternatives to ANNs, because they are sparse, accessing very few weights, and typically only use addition operations instead of the more power-intensive multiply-and-accumulate (MAC) operations. The vast majority of neuromorphic hardware designs support rate-encoded SNNs, where the information is encoded in spike rates. Rate-encoded SNNs could be seen as inefficient as an encoding scheme because it involves the transmission of a large number of spikes. A more efficient encoding scheme, Time-To-First-Spike (TTFS) encoding, encodes information in the relative time of arrival of spikes. While TTFS-encoded SNNs are more efficient than rate-encoded SNNs, they have, up to now, performed poorly in terms of accuracy compared to previous methods. Hence, in this work, we aim to overcome the limitations of TTFS-encoded neuromorphic systems. To accomplish this, we propose: (1) a novel optimization algorithm for TTFS-encoded SNNs converted from ANNs and (2) a novel hardware accelerator for TTFS-encoded SNNs, with a scalable and low-power design. Overall, our work in TTFS encoding and training improves the accuracy of SNNs to achieve state-of-the-art results on MNIST MLPs, while reducing power consumption by 1.46$\times$ over the state-of-the-art neuromorphic hardware.
Neural and Evolutionary Computing,Artificial Intelligence,Hardware Architecture,Machine Learning
What problem does this paper attempt to address?
The main problems that this paper attempts to solve are how to improve the energy efficiency and accuracy of Spiking Neural Networks (SNNs) based on Time - to - First - Spike (TTFS) encoding, making them reach the precision comparable to that of Artificial Neural Networks (ANNs). Specifically: 1. **Trade - off between the efficiency and accuracy of TTFS encoding**: Although TTFS encoding has higher energy efficiency compared to the traditional rate - encoded method because it only needs one spike to transmit information, its classification accuracy is usually lower. Therefore, the author hopes to find a method to improve the accuracy of SNNs with TTFS encoding while maintaining the advantage of high energy efficiency. 2. **Design of hardware accelerators**: Existing hardware accelerators (such as IBM's TrueNorth) perform well in implementing rate - based SNNs, but fail to fully utilize the sparsity when dealing with time - based SNNs. Therefore, the author proposes a new hardware accelerator design to better support SNNs with TTFS encoding and significantly reduce power consumption. To solve these problems, the author makes the following two main contributions: 1. **Optimization algorithm**: A new training algorithm has been developed to convert pre - trained ANNs into SNNs with TTFS encoding, and reduce the accumulated errors during the conversion process by fine - tuning the network weights. This enables SNNs with TTFS encoding to approach the accuracy of ANNs (with only a 0.2% difference), thus making them suitable for tasks of traditional ANNs. 2. **New - type hardware accelerator**: A new - type hardware accelerator named You Only Spike Once (YOSO) has been designed, which is specifically optimized for SNNs with TTFS encoding. By taking advantage of the sparsity of SNNs, this accelerator significantly reduces the number of memory accesses, thereby improving energy efficiency. Through these improvements, the author shows that SNNs with TTFS encoding achieve state - of - the - art results in the Multi - Layer Perceptron (MLP) tasks on the MNIST dataset, while the power consumption is 1.46 times lower than that of the existing state - of - the - art neuromorphic hardware. ### Formula summary In SNNs with TTFS encoding, the dynamics of the membrane potential can be expressed as: \[ \frac{dV_i^{\text{mem}}(t)}{dt}=\sum_{j\in\Gamma_i^-}w_{ij}[t - t_j]+b_i \] where: - \( V_i^{\text{mem}} \) is the membrane potential of neuron \( i \), - \( w_{ij} \) is the weight of the synaptic connection from \( j \) to \( i \), - \( t_j \) is the time of the first spike of the presynaptic neuron \( j \), - \( \Gamma_i^- \) is the set of all presynaptic neurons that generate spikes before \( t_i \), - \( b_i \) is the bias of neuron \( i \). To determine the first spike time \( t_i \) of each neuron \( i \), the membrane potential can be set equal to the threshold \( \theta \): \[ \theta=\sum_{j\in\Gamma_i^-}w_{ij}[t_i - t_j]+b_it_i \] After rearranging the terms, the first spike time \( t_i \) can be expressed as: \[ t_i=\frac{1}{\mu_i}\left(\theta+\sum_{j\in\Gamma_i^-}w_{ij}t_j\right) \] where: \[ \mu_i=\sum_{j\in\Gamma_i^-}w_{ij}+b_i \] Furthermore, the instantaneous firing rate \( r_i \)