An Event-Based Digital Compute-In-Memory Accelerator with Flexible Operand Resolution and Layer-Wise Weight/Output Stationarity

Nicolas Chauvaux,Adrian Kneip,Christoph Posch,Kofi Makinwa,Charlotte Frenkel
2024-10-30
Abstract:Compute-in-memory (CIM) accelerators for spiking neural networks (SNNs) are promising solutions to enable $\mu$s-level inference latency and ultra-low energy in edge vision applications. Yet, their current lack of flexibility at both the circuit and system levels prevents their deployment in a wide range of real-life scenarios. In this work, we propose a novel digital CIM macro that supports arbitrary operand resolution and shape, with a unified CIM storage for weights and membrane potentials. These circuit-level techniques enable a hybrid weight- and output-stationary dataflow at the system level to maximize operand reuse, thereby minimizing costly on- and off-chip data movements during the SNN execution. Measurement results of a fabricated FlexSpIM prototype in 40-nm CMOS demonstrate a 2$\times$ increase in bit-normalized energy efficiency compared to prior fixed-precision digital CIM-SNNs, while providing resolution reconfiguration with bitwise granularity. Our approach can save up to 90% energy in large-scale systems, while reaching a state-of-the-art classification accuracy of 95.8% on the IBM DVS gesture dataset.
Hardware Architecture,Artificial Intelligence
What problem does this paper attempt to address?
This paper aims to solve the following problems: 1. **Fixed - resolution problem**: Existing spiking neural network (SNN) accelerators based on in - memory computing (CIM) usually only support a fixed resolution or a few predefined options. This limits the support for different precision requirements, resulting in limited exploration of the trade - offs among precision, energy efficiency, and memory footprint. 2. **Fixed aspect - ratio problem**: Traditional SNN CIM circuits are usually limited in operand mapping to either fully bit - serial row mapping or fully bit - parallel column mapping. This fixes the operand resolutions of weights and membrane potentials at a specific ratio and cannot be adjusted flexibly. 3. **Single data - flow problem**: In order to maximize the residence time of operands and reduce external memory access, existing CIM - SNN accelerators usually only support weight stationarity (WS), ignoring the impact of membrane potential data movement, especially in the first few layers of models such as ResNet, where membrane potential data movement becomes a bottleneck. To solve these problems, the paper proposes a digital CIM accelerator named FlexSpIM, which has the following features: - **Arbitrary resolution and operand shape**: It supports weights and membrane potentials of arbitrary resolutions and can flexibly configure the shape of operands, thereby avoiding memory waste and improving energy efficiency. - **Mixed - residence data - flow**: It introduces a unified weight/membrane - potential memory, allowing the selection of weight stationarity (WS) or output stationarity (i.e., membrane - potential stationarity, OS) at each layer to maximize the reuse rate of operands and reduce data - movement overhead. Through these innovations, FlexSpIM not only improves energy efficiency but also achieves up to 90% energy savings in large - scale systems and reaches a classification accuracy of 95.8% on the IBM DVS gesture dataset.