Abstract:Tiny machine learning (TinyML) envisions executing a deep neural network (DNN)-based inference on an edge device for improving battery life, latency, security, and privacy. Toward this vision, recent microcontroller units (MCUs) integrate in-memory computing (IMC) hardware to leverage its high energy efficiency and throughput in vector–matrix multiplication (VMM). However, those existing works require large IMC hardware, severely increasing the area overhead. In addition, most existing works use analog–mixed-signal (AMS) IMC hardware, exhibiting limited robustness over process, voltage, and temperature (PVT) variations. Finally, none can support a practical software development framework such as TensorFlow Lite for Microcontrollers (TFLite-micro). Due to these limitations, those MCUs did not present the performance for the standard benchmark MLPerf-Tiny, which makes it difficult to evaluate them against the state-of-the-art neural (not necessarily IMC-based) MCUs. In this article, we design a new IMC-based MCU, titled iMCU, for TinyML to address those challenges. In the design process, we: 1) define the optimal set of acceleration targets and 2) devise an area-efficient computation flow that requires the least amount of IMC hardware yet still provides a significant acceleration. In addition, we develop: 1) state-of-the-art digital IMC macros and 2) create the accelerator based on the macros, which can support the proposed computation flow in a fully pipelined manner. Combining those innovations, we prototyped the iMCU in a 28-nm CMOS. The measurement results show that the iMCU significantly outperforms the prior IMC-based MCUs in compute density, energy efficiency, and SRAM density (total SRAM size/total SRAM area). It also achieves a compact footprint of 2.73 mm $^2$ .

A Heterogeneous In-Memory Computing Cluster For Flexible End-to-End Inference of Real-World Deep Neural Networks

DaDianNao: A Machine-Learning Supercomputer

End-to-End DNN Inference on a Massively Parallel Analog In Memory Computing Architecture

A Heterogeneous Microprocessor Based on All-Digital Compute-in-Memory for End-to-End AIoT Inference

A 64-core mixed-signal in-memory compute chip based on phase-change memory for deep neural network inference

Hadamard Product-Based In-Memory Computing Design for Floating Point Neural Network Training

A Heterogeneous Microprocessor for Intermittent AI Inference Using Nonvolatile-SRAM-based Compute-In-Memory

Scale up your In-Memory Accelerator: Leveraging Wireless-on-Chip Communication for AIMC-based CNN Inference

An in-memory computing architecture based on a duplex 2D material structure for in-situ machine learning

A Brain-Inspired Hierarchical Interactive In-Memory Computing System and Its Application in Video Sentiment Analysis

In-Memory Computing: Advances and Prospects

SIAM: Chiplet-based Scalable In-Memory Acceleration with Mesh for Deep Neural Networks

iMCU: A 28-nm Digital In-Memory Computing-Based Microcontroller Unit for TinyML

MAICC : A Lightweight Many-core Architecture with In-Cache Computing for Multi-DNN Parallel Inference.

Design and Implementation of a Charge-Sharing In-Memory-computing Macro with Sparse Feature for Quantized Neural Network

Analog or Digital In-memory Computing? Benchmarking through Quantitative Modeling

StoX-Net: Stochastic Processing of Partial Sums for Efficient In-Memory Computing DNN Accelerators

Device and Circuit Architectures for In‐Memory Computing

Containing Analog Data Deluge at Edge through Frequency-Domain Compression in Collaborative Compute-in-Memory Networks

Quartet: A 22nm 0.09mj/lnference Digital Compute-in-Memory Versatile AI Accelerator with Heterogeneous Tensor Engines and Off-Chip-Less Dataflow

16.5 DynaPlasia: an Edram In-Memory-Computing-Based Reconfigurable Spatial Accelerator with Triple-Mode Cell for Dynamic Resource Switching