Abstract:The widespread adoption of data-centric algorithms, particularly Artificial Intelligence (AI) and Machine Learning (ML), has exposed the limitations of centralized processing infrastructures, driving a shift towards edge computing. This necessitates stringent constraints on energy efficiency, which traditional von Neumann architectures struggle to meet. The Compute-In-Memory (CIM) paradigm has emerged as a superior candidate due to its efficient exploitation of available memory bandwidth. However, existing CIM solutions require high implementation effort and lack flexibility from a software integration standpoint. This work proposes a novel, software-friendly, general-purpose, and low-integration-effort Near-Memory Computing (NMC) approach, paving the way for the adoption of CIM-based systems in the next generation of edge computing nodes. Two architectural variants, NM-Caesar and NM-Carus, are proposed and characterized to target different trade-offs in area efficiency, performance, and flexibility, covering a wide range of embedded microcontrollers. Post-layout simulations show up to $25.8\times$ and $50.0\times$ lower execution time and $23.2\times$ and $33.1\times$ higher energy efficiency at the system level, respectively, compared to executing the same tasks on a state-of-the-art RISC-V CPU (RV32IMC). NM-Carus achieves a peak energy efficiency of $306.7$ GOPS/W in 8-bit matrix multiplications, surpassing recent state-of-the-art in- and near-memory circuits.

What problem does this paper attempt to address?

The paper aims to address the inefficiency of traditional von Neumann architecture in handling data-intensive workloads in edge computing. Specifically, with the widespread application of data-driven algorithms, especially in artificial intelligence and machine learning, the limitations of centralized processing infrastructure have become increasingly apparent, prompting a shift towards edge computing. However, edge computing nodes require strict energy efficiency constraints, which traditional von Neumann architecture struggles to meet. To this end, the paper proposes a new Near-Memory Computing (NMC) approach that is software-friendly, highly versatile, and easy to integrate, suitable for next-generation edge computing nodes. The main contributions of the paper include: 1. Proposing a software-friendly, low-cost, and easy-to-integrate NMC method suitable for general-purpose low-power edge devices, and validating its effectiveness by implementing two architectural variants for different categories of embedded system chips (SoCs). 2. Designing an efficient RISC-V custom Instruction Set Architecture (ISA) extension that supports flexible vector operations on a programmable CIM architecture, providing a vector view based on host computing memory without explicit vector load/store operations. 3. Conducting an in-depth, quantitative, and comprehensive analysis of the impact and advantages of replacing traditional SRAM banks with the proposed NMC macro modules in low-power microcontroller units (MCUs). The paper proposes two specific architectural variants: - **NM-Caesar**: An area-efficient NMC unit supporting SIMD operations, controlled by the host system microcontroller, suitable for regular TinyML benchmarks such as peak detection algorithms and lightweight artificial neural networks. - **NM-Carus**: A fully autonomous, vector-operation-supporting RISC-V programmable NMC unit, suitable for highly parallel and complex TinyML applications such as deep neural networks or tasks with data-dependent control flow. Both architectures are designed to be as close as possible to traditional embedded SRAM bank replacements, providing a physically and functionally SRAM-compatible interface with transparent memory operation modes. Their focus is on programmability and ease of integration to overcome the barriers to widespread adoption of existing CIM architectures.

Scalable and RISC-V Programmable Near-Memory Computing Architectures for Edge Nodes

A Robust 8-Bit Non-Volatile Computing-in-Memory Core for Low-Power Parallel MAC Operations.

A Low-Power In-Memory Multiplication and Accumulation Array with Modified Radix-4 Input and Canonical Signed Digit Weights

An 8-Bit in Resistive Memory Computing Core with Regulated Passive Neuron and Bitline Weight Mapping

A Heterogeneous Microprocessor for Intermittent AI Inference Using Nonvolatile-SRAM-based Compute-In-Memory

CMOS-integrated memristive non-volatile computing-in-memory for AI edge processors

A Heterogeneous Microprocessor Based on All-Digital Compute-in-Memory for End-to-End AIoT Inference

Edge AI without Compromise: Efficient, Versatile and Accurate Neurocomputing in Resistive Random-Access Memory

NS-CIM: A Current-Mode Computation-in-Memory Architecture Enabling Near-Sensor Processing for Intelligent IoT Vision Nodes.

Challenges and Trends Indeveloping Nonvolatile Memory-Enabled Computing Chips for Intelligent Edge Devices

Challenges and Trends of SRAM-Based Computing-In-Memory for AI Edge Devices

A compute-in-memory chip based on resistive random-access memory

Containing Analog Data Deluge at Edge through Frequency-Domain Compression in Collaborative Compute-in-Memory Networks

Energy-efficient SNN Architecture using 3nm FinFET Multiport SRAM-based CIM with Online Learning

RDCIM: RISC-V Supported Full-Digital Computing-in-Memory Processor With High Energy Efficiency and Low Area Overhead

An 8-bit In Resistive Memory Computing Core with Regulated Passive Neuron and Bit Line Weight Mapping

A Low Power In-Memory Multiplication andAccumulation Array with Modified Radix-4 Inputand Canonical Signed Digit Weights

An Edram Based Computing-in-Memory Macro with Full-Valid-Storage and Channel-Wise-Parallelism for Depthwise Neural Network

Computing In-Memory, Revisited

Overflow-free Compute Memories for Edge AI Acceleration

A 28-Nm Compute SRAM with Bit-Serial Logic/Arithmetic Operations for Programmable In-Memory Vector Computing