Abstract:Computing-in-memory (CIM) chips have demonstrated promising high energy efficiency on multiply–accumulate (MAC) operations for artificial intelligence (AI) applications. Though integral (INT) CIM chips are emerging, the floating-point (FP) CIM chip has not been well explored. The high-accuracy demand of larger models and complex tasks requires FP computation. Besides, most of the neural network (NN) training tasks still rely on FP computation. This work presents an energy-efficient FP CIM processor. It is observed that most of the exponent values of FP data are concentrated in a small region. Therefore, the FP computations are divided into intensive and sparse parts and then executed on an intensive-CIM sparse-digital architecture. First, an FP-to-INT CIM workflow for the intensive FP operations is designed to reduce the CIM execution cycles. Second, a flexible sparse-digital core is proposed for the remaining sparse FP operations. Utilizing both the intensive-CIM and sparse-digital cores, this work can achieve both high energy efficiency and identical accuracy to the FP algorithm baseline. Considering the FP CIM execution flow, a CIM-friendly low-bit FP training method is proposed to further reduce the execution cycles. Besides, a low-MAC-value (MACV) CIM macro is designed to utilize the more random sparsity brought by FP alignment. The 28-nm fabricated chip shows 275–1615-TOPS/W@INT4 and 17.2–91.3-TOPS/W@FP16 macro energy efficiency from dense to the average sparsity on the tested models.

A 28nm 314.6TLFOPS/W Reconfigurable Floating-Point Analog Compute-In-Memory Macro with Exponent Approximation and Two-Stage Sharing TD-ADC

A Low-Power In-Memory Multiplication and Accumulation Array with Modified Radix-4 Input and Canonical Signed Digit Weights

A Robust 8-Bit Non-Volatile Computing-in-Memory Core for Low-Power Parallel MAC Operations.

A Reconfigurable Floating-Point Compute-In-Memory with Analog Exponent Pre-Processes

A 1.97 TFLOPS/W Configurable SRAM-Based Floating-Point Computation-in-Memory Macro for Energy-Efficient AI Chips.

A 28-nm Floating-Point Computing-in-Memory Processor Using Intensive-CIM Sparse-Digital Architecture

A 28nm 16.9-300TOPS/W Computing-in-Memory Processor Supporting Floating-Point NN Inference/Training with Intensive-CIM Sparse-Digital Architecture

In-Memory Multi-Bit Multiplication and Accumulation (MAC) Using FeFET for Energy Efficient IoT

A 19.7 TFLOPS/W Multiply-less Logarithmic Floating-Point CIM Architecture with Error-Reduced Compensated Approximate Adder

GCFP-ACIM: A 40nm 4.74TFLOPS/W General Complex Float-Point Analog Compute-in-Memory with Adaptive Power-Saving for HDR Signal Processing Applications

An 8.8 TFLOPS/W Floating-Point RRAM-Based Compute-in-Memory Macro Using Low Latency Triangle-Style Mantissa Multiplication

A 28-nm 64-kb 31.6-TFLOPS/W Digital-Domain Floating-Point-Computing-Unit and Double-Bit 6T-SRAM Computing-in-Memory Macro for Floating-Point CNNs

34.3 A 22nm 64kb Lightning-Like Hybrid Computing-in-Memory Macro with a Compressed Adder Tree and Analog-Storage Quantizers for Transformer and CNNs.

A High-Density and Reconfigurable SRAM-Based Digital Compute-In-Memory Macro for Low-Power AI Chips.

A 28nm 128TFLOPS/W Computing-In-Memory Engine Supporting One-Shot Floating-Point NN Inference and On-Device Fine-Tuning for Edge AI

AFPR-CIM: An Analog-Domain Floating-Point RRAM-based Compute-In-Memory Architecture with Dynamic Range Adaptive FP-ADC

7.8 A 22nm Delta-Sigma Computing-In-Memory (Δ∑CIM) SRAM Macro with Near-Zero-Mean Outputs and LSB-First ADCs Achieving 21.38TOPS/W for 8b-MAC Edge AI Processing

A 137.5 TOPS/W SRAM Compute-in-Memory Macro with 9-b Memory Cell-Embedded ADCs and Signal Margin Enhancement Techniques for AI Edge Applications

A 28nm 32kb SRAM Computing-in-Memory Macro with Hierarchical Capacity Attenuator and Input Sparsity-Optimized ADC for 4b Mac Operation

An Energy-Efficient Floating-Point Compute SRAM with Pipelined In-Memory Bit-Parallel Exponent and Bitwise Mantissa Processing