Abstract:Computing-in-memory (CIM) relieves the Von Neumann bottleneck by storing the weights of neural networks in memory arrays. However, two challenges still exist, hindering the efficient acceleration of convolutional neural networks (CNN) in artificial intelligence (AI) edge devices. Firstly, the activations for sliding window (SW) operations in CNN still bring high memory access pressure. This can be alleviated by increasing the SW parallelism, but simple array replication suffers from poor array utilization and large peripheral circuits overhead. Secondly, the partial sums from individual CIM arrays, which are usually accumulated to obtain the final sum, introduce large latency due to enormous shift-and-add operations. Moreover, high-resolution ADCs are also needed to reduce the quantization error of partial sums, further increasing the hardware costs. In this paper, a hardware-efficient CIM accelerator, ARBiS, is proposed with improved activation reusability and bit-scalable matrix-vector-multiplication (MVM) for CNN acceleration in AI edge applications. The cyclic-shift weight duplication exploits a third dimension of receptive field (RF) depth for SW weight mapping to reduce the memory accesses of activations, improving the array utilization. The parasitic-capacitance charge sharing is employed to realize high-precision analog MVM in order to reduce the ADC cost. Compared with conventional architectures, ARBiS with parallel processing of 9 SW operations achieves 56.6%~58.8% alleviation of memory access pressure. Meanwhile, ARBiS configured with 8-bit ADCs saves 92.53%~94.53% ADC energy consumption. An ARBiS accelerator is evaluated to realize a computational efficiency (CE) of 10.28 (10.43) TOPS/mm2, an energy efficiency (EE) of 91.19 (112.36) TOPS/W with 8-bit (4-bit) ADCs, achieving $11.4\sim 11.7\times $ ( $11.6\sim 11.8\times $ ), $1.1\sim 3.3\times $ ( $1.4\sim 4\times $ ) improvements over state-of-the-art works, respectively.

Partial Sum Quantization for Computing-In-Memory-Based Neural Network Accelerator

Exploring Bit-Level Sparsity for Partial Sum Quantization in Computing-In-Memory Accelerator

CIMQ: A Hardware-Efficient Quantization Framework for Computing-In-Memory Based Neural Network Accelerators

A Convolutional Neural Network Accelerator Architecture with Fine-Granular Mixed Precision Configurability.

CIM2PQ: an Array-Wise and Hardware-Friendly Mixed Precision Quantization Method for Analog Computing-In-Memory

An Energy-Efficient Quantized and Regularized Training Framework for Processing-In-Memory Accelerators

MixMixQ: Quantization with Mixed Bit-Sparsity and Mixed Bit-Width for CIM Accelerators

StoX-Net: Stochastic Processing of Partial Sums for Efficient In-Memory Computing DNN Accelerators

Quantization and sparsity-aware processing for energy-efficient NVM-based convolutional neural networks

IMAGINE: An 8-to-1b 22nm FD-SOI Compute-In-Memory CNN Accelerator With an End-to-End Analog Charge-Based 0.15-8POPS/W Macro Featuring Distribution-Aware Data Reshaping

Crossbar-Aligned & Integer-Only Neural Network Compression for Efficient In-Memory Acceleration

Specific ADC of NVM-Based Computation-in-Memory for Deep Neural Networks

APIM: An Antiferromagnetic MRAM-Based Processing-In-Memory System for Efficient Bit-level Operations of Quantized Convolutional Neural Networks

Low Quantization Error Readout Circuit with Fully Charge-Domain Calculation for Computation-in-Memory Deep Neural Network

ARBiS: A Hardware-Efficient SRAM CIM CNN Accelerator with Cyclic-Shift Weight Duplication and Parasitic-Capacitance Charge Sharing for AI Edge Application

Improving the accuracy of neural networks in analog computing-in-memory systems by analog weight.

A 28-Nm 135.19 TOPS/W Bootstrapped-SRAM Compute-in-Memory Accelerator with Layer-Wise Precision and Sparsity

Mixed Precision Quantization for ReRAM-based DNN Inference Accelerators

HCiM: ADC-Less Hybrid Analog-Digital Compute in Memory Accelerator for Deep Learning Workloads

Design and Implementation of a Charge-Sharing In-Memory-computing Macro with Sparse Feature for Quantized Neural Network

An Energy-Efficient Mixed-Bit ReRAM-based Computing-in-Memory CNN Accelerator with Fully Parallel Readout.