Abstract:There is a growing necessity for edge training to adapt to dynamically changing environment. Neuromorphic computing represents a significant pathway for high-efficiency intelligent computation in energy-constrained edges, but existing neuromorphic architectures lack the ability of directly training spiking neural networks (SNNs) based on backpropagation. We develop a multi-core neuromorphic architecture with Feedforward-Propagation, Back-Propagation, and Weight-Gradient engines in each core, supporting high efficient parallel computing at both the engine and core levels. It combines various data flows and sparse computation optimization by fully leveraging the sparsity in SNN training, obtaining a high energy efficiency of 1.05TFLOPS/W@ FP16 @ 28nm, 55 ~ 85% reduction of DRAM access compared to A100 GPU in SNN trainings, and a 20-core deep SNN training and a 5-worker federated learning on FPGAs. Our study develops the first multi-core neuromorphic architecture supporting the direct SNN training, facilitating the neuromorphic computing in edge-learnable applications.

What problem does this paper attempt to address?

The main problems that this paper attempts to solve include: 1. **Limitations of existing neuromorphic architectures**: - Current neuromorphic processors (such as TrueNorth, Loihi, Tianjic, etc.) are mainly designed for brain simulation, supporting forward - computation and local learning methods (such as Hebbian learning and Spike - Timing - Dependent Plasticity, STDP), but do not support direct Spiking Neural Networks (SNNs) training based on Backpropagation (BP). This results in SNN training still relying on GPUs. 2. **Lack of efficient multi - core architectures**: - Most of the existing SNN training architectures are single - core designs, lacking multi - core architectures to support efficient and flexible model partitioning, deployment, pipelining, and parallel computing, and it is difficult to meet the requirements of deep SNN training. 3. **High - energy - consumption DRAM/HBM access**: - As the parameters of the SNN model increase, frequent access to DRAM/HBM will lead to high energy consumption and long latency, especially in cases involving multiple time steps and complex data dependencies, which poses a challenge to SNN training. To solve these problems, the author proposes a new multi - core near - memory neuromorphic architecture that supports BP - based SNN training. Specifically: - **Multi - core architecture design**: Each computing core contains a Feedforward - Propagation (FP), a Back - Propagation (BP), and a Weight - Gradient (WG) engine, achieving efficient parallel computing. - **Data - flow optimization**: By jointly designing the computing core, its data - flow, and the Network - On - Chip (NOC), this architecture achieves a high level of data reuse during SNN training, reducing the number of DRAM accesses by 55% - 85% compared to the A100 GPU. - **Sparse - computing optimization**: Fully utilize the sparsity in SNN training (such as the sparsity of spike signals, the derivative of the firing function, and the membrane potential gradient), reducing energy consumption by 45% - 60% by skipping redundant calculations. In conclusion, this research has developed the first multi - core neuromorphic architecture that supports BP - based SNN training, significantly improving energy efficiency and reducing DRAM access, promoting the development of neuromorphic computing in edge - learnable applications.

A High Energy-Efficiency Multi-core Neuromorphic Architecture for Deep SNN Training

TripleBrain: An Edge Neuromorphic Architecture for High-accuracy Single-layer Spiking Neural Network with On-chip Self-organizing and Reinforcement Learning

Mapping Very Large Scale Spiking Neuron Network to Neuromorphic Hardware.

NBSSN: A Neuromorphic Binary Single-Spike Neural Network for Efficient Edge Intelligence.

An Asynchronous Multi-core Accelerator for SNN inference

Enabling Efficient On-Edge Spiking Neural Network Acceleration with Highly Flexible FPGA Architectures

A A 22nm 0.43pj/sop Sparsity-Aware In-Memory Neuromorphic Computing System with Hybrid Spiking and Artificial Neural Network and Configurable Topology

Scalable NoC-based Neuromorphic Hardware Learning and Inference

Multicore Spiking Neuromorphic Chip in 180-nm with ReRAM Synapses and Digital Neurons

SPAT: FPGA-based Sparsity-Optimized Spiking Neural Network Training Accelerator with Temporal Parallel Dataflow

Building an open grid

Core interface optimization for multi-core neuromorphic processors

ActiveN: A Scalable and Flexibly-Programmable Event-Driven Neuromorphic Processor

An Energy-Efficient Deep Belief Network Processor Based on Heterogeneous Multi-Core Architecture With Transposable Memory and On-Chip Learning

An Efficient Neuromorphic Implementation of Temporal Coding-Based On-Chip STDP Learning

An Energy-Efficient Computing-in-Memory Neuromorphic System with On-Chip Training.

A 28nm Configurable Asynchronous SNN Accelerator with Energy-Efficient Learning

An End-to-End SoC for Brain-Inspired CNN-SNN Hybrid Applications

Multi-core ARM-based Hardware-Accelerated Computation for Spiking Neural Networks

A Reconfigurable FPGA-based Spiking Neural Network Accelerator

In-Hardware Learning of Multilayer Spiking Neural Networks on a Neuromorphic Processor