Abstract:Computing-in-memory (CIM) relieves the Von Neumann bottleneck by storing the weights of neural networks in memory arrays. However, two challenges still exist, hindering the efficient acceleration of convolutional neural networks (CNN) in artificial intelligence (AI) edge devices. Firstly, the activations for sliding window (SW) operations in CNN still bring high memory access pressure. This can be alleviated by increasing the SW parallelism, but simple array replication suffers from poor array utilization and large peripheral circuits overhead. Secondly, the partial sums from individual CIM arrays, which are usually accumulated to obtain the final sum, introduce large latency due to enormous shift-and-add operations. Moreover, high-resolution ADCs are also needed to reduce the quantization error of partial sums, further increasing the hardware costs. In this paper, a hardware-efficient CIM accelerator, ARBiS, is proposed with improved activation reusability and bit-scalable matrix-vector-multiplication (MVM) for CNN acceleration in AI edge applications. The cyclic-shift weight duplication exploits a third dimension of receptive field (RF) depth for SW weight mapping to reduce the memory accesses of activations, improving the array utilization. The parasitic-capacitance charge sharing is employed to realize high-precision analog MVM in order to reduce the ADC cost. Compared with conventional architectures, ARBiS with parallel processing of 9 SW operations achieves 56.6%~58.8% alleviation of memory access pressure. Meanwhile, ARBiS configured with 8-bit ADCs saves 92.53%~94.53% ADC energy consumption. An ARBiS accelerator is evaluated to realize a computational efficiency (CE) of 10.28 (10.43) TOPS/mm2, an energy efficiency (EE) of 91.19 (112.36) TOPS/W with 8-bit (4-bit) ADCs, achieving $11.4\sim 11.7\times $ ( $11.6\sim 11.8\times $ ), $1.1\sim 3.3\times $ ( $1.4\sim 4\times $ ) improvements over state-of-the-art works, respectively.

Pod-racing: bulk-bitwise to floating-point compute in racetrack memory for machine learning at the edge

FPIRM: Floating-point Processing in Racetrack Memories

A Low-Power In-Memory Multiplication and Accumulation Array with Modified Radix-4 Input and Canonical Signed Digit Weights

A 28-nm 64-kb 31.6-TFLOPS/W Digital-Domain Floating-Point-Computing-Unit and Double-Bit 6T-SRAM Computing-in-Memory Macro for Floating-Point CNNs

An ADC-less RRAM-based Computing-in-Memory Macro with Binary CNN for Efficient Edge AI

A 28nm 128TFLOPS/W Computing-In-Memory Engine Supporting One-Shot Floating-Point NN Inference and On-Device Fine-Tuning for Edge AI

SRAM-Based Processing-In-Memory Design with Kullback-Leibler Divergence-Based Dynamic Precision Quantization.

An Energy-Efficient Quantized and Regularized Training Framework for Processing-In-Memory Accelerators

CAP-RAM: A Charge-Domain In-Memory Computing 6T-SRAM for Accurate and Precision-Programmable CNN Inference

An Emerging NVM CIM Accelerator with Shared-Path Transpose Read and Bit-Interleaving Weight Storage for Efficient On-Chip Training in Edge Devices

Cambricon-M: A Fibonacci-Coded Charge-Domain SRAM-Based CIM Accelerator for DNN Inference

ARBiS: A Hardware-Efficient SRAM CIM CNN Accelerator with Cyclic-Shift Weight Duplication and Parasitic-Capacitance Charge Sharing for AI Edge Application

A Configurable Multi-Precision CNN Computing Framework Based on Single Bit RRAM

A high-speed reusable quantized hardware accelerator design for CNN on constrained edge device

Straightforward Data Transfer in a Blockwise Dataflow for an Analog RRAM-based CIM System

An Energy-Efficient Floating-Point Compute SRAM with Pipelined In-Memory Bit-Parallel Exponent and Bitwise Mantissa Processing

Voxel-CIM: An Efficient Compute-in-Memory Accelerator for Voxel-based Point Cloud Neural Networks

24.5 A Twin-8T SRAM Computation-In-Memory Macro for Multiple-Bit CNN-Based Machine Learning

Neural-PIM: Efficient Processing-In-Memory with Neural Approximation of Peripherals

CIMQ: A Hardware-Efficient Quantization Framework for Computing-In-Memory Based Neural Network Accelerators

ConvPIM: Evaluating Digital Processing-in-Memory through Convolutional Neural Network Acceleration