Scalable and RISC-V Programmable Near-Memory Computing Architectures for Edge Nodes

Michele Caon,Clément Choné,Pasquale Davide Schiavone,Alexandre Levisse,Guido Masera,Maurizio Martina,David Atienza
2024-06-20
Abstract:The widespread adoption of data-centric algorithms, particularly Artificial Intelligence (AI) and Machine Learning (ML), has exposed the limitations of centralized processing infrastructures, driving a shift towards edge computing. This necessitates stringent constraints on energy efficiency, which traditional von Neumann architectures struggle to meet. The Compute-In-Memory (CIM) paradigm has emerged as a superior candidate due to its efficient exploitation of available memory bandwidth. However, existing CIM solutions require high implementation effort and lack flexibility from a software integration standpoint. This work proposes a novel, software-friendly, general-purpose, and low-integration-effort Near-Memory Computing (NMC) approach, paving the way for the adoption of CIM-based systems in the next generation of edge computing nodes. Two architectural variants, NM-Caesar and NM-Carus, are proposed and characterized to target different trade-offs in area efficiency, performance, and flexibility, covering a wide range of embedded microcontrollers. Post-layout simulations show up to $25.8\times$ and $50.0\times$ lower execution time and $23.2\times$ and $33.1\times$ higher energy efficiency at the system level, respectively, compared to executing the same tasks on a state-of-the-art RISC-V CPU (RV32IMC). NM-Carus achieves a peak energy efficiency of $306.7$ GOPS/W in 8-bit matrix multiplications, surpassing recent state-of-the-art in- and near-memory circuits.
Hardware Architecture
What problem does this paper attempt to address?
The paper aims to address the inefficiency of traditional von Neumann architecture in handling data-intensive workloads in edge computing. Specifically, with the widespread application of data-driven algorithms, especially in artificial intelligence and machine learning, the limitations of centralized processing infrastructure have become increasingly apparent, prompting a shift towards edge computing. However, edge computing nodes require strict energy efficiency constraints, which traditional von Neumann architecture struggles to meet. To this end, the paper proposes a new Near-Memory Computing (NMC) approach that is software-friendly, highly versatile, and easy to integrate, suitable for next-generation edge computing nodes. The main contributions of the paper include: 1. Proposing a software-friendly, low-cost, and easy-to-integrate NMC method suitable for general-purpose low-power edge devices, and validating its effectiveness by implementing two architectural variants for different categories of embedded system chips (SoCs). 2. Designing an efficient RISC-V custom Instruction Set Architecture (ISA) extension that supports flexible vector operations on a programmable CIM architecture, providing a vector view based on host computing memory without explicit vector load/store operations. 3. Conducting an in-depth, quantitative, and comprehensive analysis of the impact and advantages of replacing traditional SRAM banks with the proposed NMC macro modules in low-power microcontroller units (MCUs). The paper proposes two specific architectural variants: - **NM-Caesar**: An area-efficient NMC unit supporting SIMD operations, controlled by the host system microcontroller, suitable for regular TinyML benchmarks such as peak detection algorithms and lightweight artificial neural networks. - **NM-Carus**: A fully autonomous, vector-operation-supporting RISC-V programmable NMC unit, suitable for highly parallel and complex TinyML applications such as deep neural networks or tasks with data-dependent control flow. Both architectures are designed to be as close as possible to traditional embedded SRAM bank replacements, providing a physically and functionally SRAM-compatible interface with transparent memory operation modes. Their focus is on programmability and ease of integration to overcome the barriers to widespread adoption of existing CIM architectures.