Abstract:Computing-in-memory (CIM) relieves the Von Neumann bottleneck by storing the weights of neural networks in memory arrays. However, two challenges still exist, hindering the efficient acceleration of convolutional neural networks (CNN) in artificial intelligence (AI) edge devices. Firstly, the activations for sliding window (SW) operations in CNN still bring high memory access pressure. This can be alleviated by increasing the SW parallelism, but simple array replication suffers from poor array utilization and large peripheral circuits overhead. Secondly, the partial sums from individual CIM arrays, which are usually accumulated to obtain the final sum, introduce large latency due to enormous shift-and-add operations. Moreover, high-resolution ADCs are also needed to reduce the quantization error of partial sums, further increasing the hardware costs. In this paper, a hardware-efficient CIM accelerator, ARBiS, is proposed with improved activation reusability and bit-scalable matrix-vector-multiplication (MVM) for CNN acceleration in AI edge applications. The cyclic-shift weight duplication exploits a third dimension of receptive field (RF) depth for SW weight mapping to reduce the memory accesses of activations, improving the array utilization. The parasitic-capacitance charge sharing is employed to realize high-precision analog MVM in order to reduce the ADC cost. Compared with conventional architectures, ARBiS with parallel processing of 9 SW operations achieves 56.6%~58.8% alleviation of memory access pressure. Meanwhile, ARBiS configured with 8-bit ADCs saves 92.53%~94.53% ADC energy consumption. An ARBiS accelerator is evaluated to realize a computational efficiency (CE) of 10.28 (10.43) TOPS/mm2, an energy efficiency (EE) of 91.19 (112.36) TOPS/W with 8-bit (4-bit) ADCs, achieving $11.4\sim 11.7\times $ ( $11.6\sim 11.8\times $ ), $1.1\sim 3.3\times $ ( $1.4\sim 4\times $ ) improvements over state-of-the-art works, respectively.

SUN: Dynamic Hybrid-Precision SRAM-Based CIM Accelerator With High Macro Utilization Using Structured Pruning Mixed-Precision Networks

A Low-Power In-Memory Multiplication and Accumulation Array with Modified Radix-4 Input and Canonical Signed Digit Weights

34.3 A 22nm 64kb Lightning-Like Hybrid Computing-in-Memory Macro with a Compressed Adder Tree and Analog-Storage Quantizers for Transformer and CNNs.

A 65 Nm 73 Kb SRAM-Based Computing-In-Memory Macro with Dynamic-Sparsity Controlling

24.5 A Twin-8T SRAM Computation-In-Memory Macro for Multiple-Bit CNN-Based Machine Learning

A Weight-Reload-Eliminated Compute-in-Memory Accelerator for 60 fps 4K Super-Resolution

SSM-CIM: an Efficient CIM Macro Featuring Single-Step Multi-bit MAC Computation for CNN Edge Inference

SPCIM: Sparsity-Balanced Practical CIM Accelerator with Optimized Spatial-Temporal Multi-Macro Utilization

A 4-Kb 1-to-8-bit Configurable 6T SRAM-Based Computation-in-Memory Unit-Macro for CNN-Based AI Edge Processors

An 11T1C Bit-Level-Sparsity-Aware Computing-in-Memory Macro with Adaptive Conversion Time and Computation Voltage

SRAM-Based CIM Architecture Design for Event Detection

An Energy-Efficient Mixed-Bit CNN Accelerator With Column Parallel Readout for ReRAM-Based In-Memory Computing

A Systolic Computing-in-Memory Array Based Accelerator with Predictive Early Activation for Spatiotemporal Convolutions

OSA-HCIM: On-The-Fly Saliency-Aware Hybrid SRAM CIM with Dynamic Precision Configuration

14.3 A 65nm Computing-in-Memory-Based CNN Processor with 2.9-to-35.8tops/w System Energy Efficiency Using Dynamic-Sparsity Performance-Scaling Architecture and Energy-Efficient Inter/Intra-Macro Data Reuse.

CIMQ: A Hardware-Efficient Quantization Framework for Computing-In-Memory Based Neural Network Accelerators

A 28-nm 64-kb 31.6-TFLOPS/W Digital-Domain Floating-Point-Computing-Unit and Double-Bit 6T-SRAM Computing-in-Memory Macro for Floating-Point CNNs

A Twin-8T SRAM Computation-in-Memory Unit-Macro for Multibit CNN-Based AI Edge Processors

CAP-RAM: A Charge-Domain In-Memory Computing 6T-SRAM for Accurate and Precision-Programmable CNN Inference

ARBiS: A Hardware-Efficient SRAM CIM CNN Accelerator with Cyclic-Shift Weight Duplication and Parasitic-Capacitance Charge Sharing for AI Edge Application

CIMFormer: A Systolic CIM-Array-Based Transformer Accelerator with Token-Pruning-Aware Attention Reformulating and Principal Possibility Gathering