Abstract:Edge computing is a promising solution for handling high-dimensional, multispectral analog data from sensors and IoT devices for applications such as autonomous drones. However, edge devices' limited storage and computing resources make it challenging to perform complex predictive modeling at the edge. Compute-in-memory (CiM) has emerged as a principal paradigm to minimize energy for deep learning-based inference at the edge. Nevertheless, integrating storage and processing complicates memory cells and/or memory peripherals, essentially trading off area efficiency for energy efficiency. This paper proposes a novel solution to improve area efficiency in deep learning inference tasks. The proposed method employs two key strategies. Firstly, a Frequency domain learning approach uses binarized Walsh-Hadamard Transforms, reducing the necessary parameters for DNN (by 87% in MobileNetV2) and enabling compute-in-SRAM, which better utilizes parallelism during inference. Secondly, a memory-immersed collaborative digitization method is described among CiM arrays to reduce the area overheads of conventional ADCs. This facilitates more CiM arrays in limited footprint designs, leading to better parallelism and reduced external memory accesses. Different networking configurations are explored, where Flash, SA, and their hybrid digitization steps can be implemented using the memory-immersed scheme. The results are demonstrated using a 65 nm CMOS test chip, exhibiting significant area and energy savings compared to a 40 nm-node 5-bit SAR ADC and 5-bit Flash ADC. By processing analog data more efficiently, it is possible to selectively retain valuable data from sensors and alleviate the challenges posed by the analog data deluge.

WWW: What, When, Where to Compute-in-Memory

An Overview of Computing-in-Memory Interfaces

Modeling and Benchmarking Computing-in-Memory for Design Space Exploration.

A Low-Power In-Memory Multiplication and Accumulation Array with Modified Radix-4 Input and Canonical Signed Digit Weights

Computing-in-memory Circuits and Cross-Layer Integrated Design and Optimization: from SRAM to FeFET

A design framework for processing-in-memory accelerator

Compute-in-Memory Technologies and Architectures for Deep Learning Workloads

In-Memory Computing: Advances and Prospects

Compute-in-Memory for Numerical Computations

Containing Analog Data Deluge at Edge through Frequency-Domain Compression in Collaborative Compute-in-Memory Networks

Computing In-Memory, Revisited

Device and Circuit Architectures for In‐Memory Computing

Memory Is All You Need: An Overview of Compute-in-Memory Architectures for Accelerating Large Language Model Inference

CiMLoop: A Flexible, Accurate, and Fast Compute-In-Memory Modeling Tool

A Spatial-Designed Computing-In-Memory Architecture Based on Monolithic 3D Integration for High-Performance Systems.

Memristor Based Mixed-Accuracy Computation-in-Memory System.

Architecture-circuit-technology Co-Optimization for Resistive Random Access Memory-Based Computation-in-memory Chips

Eva-CiM: A System-Level Performance and Energy Evaluation Framework for Computing-in-Memory Architectures

Analog or Digital In-memory Computing? Benchmarking through Quantitative Modeling

TensorCIM: Digital Computing-in-Memory Tensor Processor with Multichip-Module-Based Architecture for Beyond-NN Acceleration

The Landscape of Compute-near-memory and Compute-in-memory: A Research and Commercial Overview