Abstract:Quantized neural networks (QNNs), which perform multiply-accumulate (MAC) operations with low-precision weights or activations, have been widely exploited to reduce energy consumption. QNNs usually have a trade-off between energy consumption and accuracy depending on the quantized precision, so that it is necessary to select an appropriate precision for energy efficiency. Nevertheless, the conventional hardware accelerators such as Google TPU are typically designed and optimized for a specific precision (e.g., 8-bit), which may degrade energy efficiency for other precisions. Though an analog-based computing-in-memory (CIM) technology supporting variable precision has been proposed to improve energy efficiency, its implementation requires extremely large and power-consuming analog-to-digital converters (ADCs). In this paper, we propose Scale-CIM , a precision-scalable CIM architecture which supports MAC operations based on digital computations (not analog computations). Scale-CIM performs binary MAC operations with high parallelism, by executing digital-based multiplication operations in the CIM array and accumulation operations in the peripheral logic. In addition, Scale-CIM supports multi-bit MAC operations without ADCs, based on the binary MAC operations and shift operations depending on the precision. Since Scale-CIM fully utilizes the CIM array for various quantized precisions (not for a specific precision), it achieves high compute-throughput. Consequently, Scale-CIM enables precision-scalable CIM-based MAC operations with high parallelism. Our simulation results show that Scale-CIM achieves 1.5∼15.8 × speedup and reduces system energy consumption by 53.7∼95.7% across different quantized precisions, compared to the state-of-the-art precision-scalable accelerator.

Toggle Rate Aware Quantization Model Based on Digital Floating-Point Computing-in-Memory Architecture

A Low-Power In-Memory Multiplication and Accumulation Array with Modified Radix-4 Input and Canonical Signed Digit Weights

An Overview of Computing-in-Memory Interfaces

An 8-Bit in Resistive Memory Computing Core with Regulated Passive Neuron and Bitline Weight Mapping

In-Memory Multi-Bit Multiplication and Accumulation (MAC) Using FeFET for Energy Efficient IoT

A 28-nm Floating-Point Computing-in-Memory Processor Using Intensive-CIM Sparse-Digital Architecture

A 19.7 TFLOPS/W Multiply-less Logarithmic Floating-Point CIM Architecture with Error-Reduced Compensated Approximate Adder

CIMQ: A Hardware-Efficient Quantization Framework for Computing-In-Memory Based Neural Network Accelerators

A 28nm 16.9-300TOPS/W Computing-in-Memory Processor Supporting Floating-Point NN Inference/Training with Intensive-CIM Sparse-Digital Architecture

A 28nm 314.6TLFOPS/W Reconfigurable Floating-Point Analog Compute-In-Memory Macro with Exponent Approximation and Two-Stage Sharing TD-ADC

Scale-CIM: Precision-Scalable Computing-in-Memory for Energy-Efficient Quantized Neural Networks

AFPR-CIM: An Analog-Domain Floating-Point RRAM-based Compute-In-Memory Architecture with Dynamic Range Adaptive FP-ADC

TT@CIM: A Tensor-Train In-Memory-Computing Processor Using Bit-Level-Sparsity Optimization and Variable Precision Quantization

A 28nm 128TFLOPS/W Computing-In-Memory Engine Supporting One-Shot Floating-Point NN Inference and On-Device Fine-Tuning for Edge AI

A Reconfigurable Floating-Point Compute-In-Memory with Analog Exponent Pre-Processes

A 28-nm 64-kb 31.6-TFLOPS/W Digital-Domain Floating-Point-Computing-Unit and Double-Bit 6T-SRAM Computing-in-Memory Macro for Floating-Point CNNs

A 1.97 TFLOPS/W Configurable SRAM-Based Floating-Point Computation-in-Memory Macro for Energy-Efficient AI Chips.

ReDCIM: Reconfigurable Digital Computing- in -Memory Processor with Unified FP/INT Pipeline for Cloud AI Acceleration

An Energy-Efficient Floating-Point Compute SRAM with Pipelined In-Memory Bit-Parallel Exponent and Bitwise Mantissa Processing

BR-CIM: an Efficient Binary Representation Computation-In-Memory Design