Abstract:Compute-in-memory (CIM) accelerators for spiking neural networks (SNNs) are promising solutions to enable $\mu$s-level inference latency and ultra-low energy in edge vision applications. Yet, their current lack of flexibility at both the circuit and system levels prevents their deployment in a wide range of real-life scenarios. In this work, we propose a novel digital CIM macro that supports arbitrary operand resolution and shape, with a unified CIM storage for weights and membrane potentials. These circuit-level techniques enable a hybrid weight- and output-stationary dataflow at the system level to maximize operand reuse, thereby minimizing costly on- and off-chip data movements during the SNN execution. Measurement results of a fabricated FlexSpIM prototype in 40-nm CMOS demonstrate a 2$\times$ increase in bit-normalized energy efficiency compared to prior fixed-precision digital CIM-SNNs, while providing resolution reconfiguration with bitwise granularity. Our approach can save up to 90% energy in large-scale systems, while reaching a state-of-the-art classification accuracy of 95.8% on the IBM DVS gesture dataset.

What problem does this paper attempt to address?

This paper aims to solve the following problems: 1. **Fixed - resolution problem**: Existing spiking neural network (SNN) accelerators based on in - memory computing (CIM) usually only support a fixed resolution or a few predefined options. This limits the support for different precision requirements, resulting in limited exploration of the trade - offs among precision, energy efficiency, and memory footprint. 2. **Fixed aspect - ratio problem**: Traditional SNN CIM circuits are usually limited in operand mapping to either fully bit - serial row mapping or fully bit - parallel column mapping. This fixes the operand resolutions of weights and membrane potentials at a specific ratio and cannot be adjusted flexibly. 3. **Single data - flow problem**: In order to maximize the residence time of operands and reduce external memory access, existing CIM - SNN accelerators usually only support weight stationarity (WS), ignoring the impact of membrane potential data movement, especially in the first few layers of models such as ResNet, where membrane potential data movement becomes a bottleneck. To solve these problems, the paper proposes a digital CIM accelerator named FlexSpIM, which has the following features: - **Arbitrary resolution and operand shape**: It supports weights and membrane potentials of arbitrary resolutions and can flexibly configure the shape of operands, thereby avoiding memory waste and improving energy efficiency. - **Mixed - residence data - flow**: It introduces a unified weight/membrane - potential memory, allowing the selection of weight stationarity (WS) or output stationarity (i.e., membrane - potential stationarity, OS) at each layer to maximize the reuse rate of operands and reduce data - movement overhead. Through these innovations, FlexSpIM not only improves energy efficiency but also achieves up to 90% energy savings in large - scale systems and reaches a classification accuracy of 95.8% on the IBM DVS gesture dataset.

An Event-Based Digital Compute-In-Memory Accelerator with Flexible Operand Resolution and Layer-Wise Weight/Output Stationarity

A Robust 8-Bit Non-Volatile Computing-in-Memory Core for Low-Power Parallel MAC Operations.

SpiDR: A Reconfigurable Digital Compute-in-Memory Spiking Neural Network Accelerator for Event-based Perception

Tempo-CIM: A RRAM Compute-in-Memory Neuromorphic Accelerator with Area-Efficient LIF Neuron and Split-Train-Merged-Inference Algorithm for Edge AI Applications.

A Reconfigurable Computing-in-Memory Accelerator with Dynamic Group-Based Dataflow and Dual-Input Macro Designs

An Energy-Efficient Computing-in-Memory NN Processor with Set-Associate Blockwise Sparsity and Ping-Pong Weight Update

S2D-CIM: A 22nm 128kb Systolic Digital Compute-in-Memory Macro with Domino Data Path for Flexible Vector Operation and 2-D Weight Update in Edge AI Applications

TensorCIM: Digital Computing-in-Memory Tensor Processor with Multichip-Module-Based Architecture for Beyond-NN Acceleration

Energy-efficient SNN Architecture using 3nm FinFET Multiport SRAM-based CIM with Online Learning

DS-CIM: A 40nm Asynchronous Dual-Spike Driven, MRAM Compute-In-Memory Macro for Spiking Neural Network

A 2.75-to-75.9tops/w Computing-in-Memory NN Processor Supporting Set-Associate Block-Wise Zero Skipping and Ping-Pong CIM with Simultaneous Computation and Weight Updating.

SPCIM: Sparsity-Balanced Practical CIM Accelerator with Optimized Spatial-Temporal Multi-Macro Utilization

A Digital SRAM Computing-in-Memory Design Utilizing Activation Unstructured Sparsity for High-Efficient DNN Inference

A 28-nm Floating-Point Computing-in-Memory Processor Using Intensive-CIM Sparse-Digital Architecture

An Edram Based Computing-in-Memory Macro with Full-Valid-Storage and Channel-Wise-Parallelism for Depthwise Neural Network

STICKER-IM: A 65 nm Computing-in-Memory NN Processor Using Block-Wise Sparsity Optimization and Inter/Intra-Macro Data Reuse

A 28nm 16.9-300TOPS/W Computing-in-Memory Processor Supporting Floating-Point NN Inference/Training with Intensive-CIM Sparse-Digital Architecture

Spike-CIM: A 290TOPS/W Spike-Encoding Sparsity-Adaptive Computing-in-Memory Macro with Differential Charge-Domain Integrate-and-Fire

14.3 A 65nm Computing-in-Memory-Based CNN Processor with 2.9-to-35.8tops/w System Energy Efficiency Using Dynamic-Sparsity Performance-Scaling Architecture and Energy-Efficient Inter/Intra-Macro Data Reuse.

34.3 A 22nm 64kb Lightning-Like Hybrid Computing-in-Memory Macro with a Compressed Adder Tree and Analog-Storage Quantizers for Transformer and CNNs.