A High Energy-Efficiency Multi-core Neuromorphic Architecture for Deep SNN Training

Mingjing Li,Huihui Zhou,Xiaofeng Xu,Zhiwei Zhong,Puli Quan,Xueke Zhu,Yanyu Lin,Wenjie Lin,Hongyu Guo,Junchao Zhang,Yunhao Ma,Wei Wang,Zhengyu Ma,Guoqi Li,Xiaoxin Cui,Yonghong Tian
2024-11-26
Abstract:There is a growing necessity for edge training to adapt to dynamically changing environment. Neuromorphic computing represents a significant pathway for high-efficiency intelligent computation in energy-constrained edges, but existing neuromorphic architectures lack the ability of directly training spiking neural networks (SNNs) based on backpropagation. We develop a multi-core neuromorphic architecture with Feedforward-Propagation, Back-Propagation, and Weight-Gradient engines in each core, supporting high efficient parallel computing at both the engine and core levels. It combines various data flows and sparse computation optimization by fully leveraging the sparsity in SNN training, obtaining a high energy efficiency of 1.05TFLOPS/W@ FP16 @ 28nm, 55 ~ 85% reduction of DRAM access compared to A100 GPU in SNN trainings, and a 20-core deep SNN training and a 5-worker federated learning on FPGAs. Our study develops the first multi-core neuromorphic architecture supporting the direct SNN training, facilitating the neuromorphic computing in edge-learnable applications.
Hardware Architecture,Distributed, Parallel, and Cluster Computing,Machine Learning
What problem does this paper attempt to address?
The main problems that this paper attempts to solve include: 1. **Limitations of existing neuromorphic architectures**: - Current neuromorphic processors (such as TrueNorth, Loihi, Tianjic, etc.) are mainly designed for brain simulation, supporting forward - computation and local learning methods (such as Hebbian learning and Spike - Timing - Dependent Plasticity, STDP), but do not support direct Spiking Neural Networks (SNNs) training based on Backpropagation (BP). This results in SNN training still relying on GPUs. 2. **Lack of efficient multi - core architectures**: - Most of the existing SNN training architectures are single - core designs, lacking multi - core architectures to support efficient and flexible model partitioning, deployment, pipelining, and parallel computing, and it is difficult to meet the requirements of deep SNN training. 3. **High - energy - consumption DRAM/HBM access**: - As the parameters of the SNN model increase, frequent access to DRAM/HBM will lead to high energy consumption and long latency, especially in cases involving multiple time steps and complex data dependencies, which poses a challenge to SNN training. To solve these problems, the author proposes a new multi - core near - memory neuromorphic architecture that supports BP - based SNN training. Specifically: - **Multi - core architecture design**: Each computing core contains a Feedforward - Propagation (FP), a Back - Propagation (BP), and a Weight - Gradient (WG) engine, achieving efficient parallel computing. - **Data - flow optimization**: By jointly designing the computing core, its data - flow, and the Network - On - Chip (NOC), this architecture achieves a high level of data reuse during SNN training, reducing the number of DRAM accesses by 55% - 85% compared to the A100 GPU. - **Sparse - computing optimization**: Fully utilize the sparsity in SNN training (such as the sparsity of spike signals, the derivative of the firing function, and the membrane potential gradient), reducing energy consumption by 45% - 60% by skipping redundant calculations. In conclusion, this research has developed the first multi - core neuromorphic architecture that supports BP - based SNN training, significantly improving energy efficiency and reducing DRAM access, promoting the development of neuromorphic computing in edge - learnable applications.