Abstract:Realizing today's cloud-level artificial intelligence functionalities directly on devices distributed at the edge of the internet calls for edge hardware capable of processing multiple modalities of sensory data (e.g. video, audio) at unprecedented energy-efficiency. AI hardware architectures today cannot meet the demand due to a fundamental "memory wall": data movement between separate compute and memory units consumes large energy and incurs long latency. Resistive random-access memory (RRAM) based compute-in-memory (CIM) architectures promise to bring orders of magnitude energy-efficiency improvement by performing computation directly within memory. However, conventional approaches to CIM hardware design limit its functional flexibility necessary for processing diverse AI workloads, and must overcome hardware imperfections that degrade inference accuracy. Such trade-offs between efficiency, versatility and accuracy cannot be addressed by isolated improvements on any single level of the design. By co-optimizing across all hierarchies of the design from algorithms and architecture to circuits and devices, we present NeuRRAM - the first multimodal edge AI chip using RRAM CIM to simultaneously deliver a high degree of versatility for diverse model architectures, record energy-efficiency $5\times$ - $8\times$ better than prior art across various computational bit-precisions, and inference accuracy comparable to software models with 4-bit weights on all measured standard AI benchmarks including accuracy of 99.0% on MNIST and 85.7% on CIFAR-10 image classification, 84.7% accuracy on Google speech command recognition, and a 70% reduction in image reconstruction error on a Bayesian image recovery task. This work paves a way towards building highly efficient and reconfigurable edge AI hardware platforms for the more demanding and heterogeneous AI applications of the future.

Special Topic on Nonvolatile Memory for Efficient Implementation of Neural/Neuromorphic Computing

The Impact of Non-linear NVM Devices on In-Memory Computing

A Near Memory Computing FPGA Architecture for Neural Network Acceleration

DaDianNao: A Machine-Learning Supercomputer

Neural Network Acceleration and Voice Recognition with a Flash-based In-Memory Computing SoC

Neural Networks on Chip: from CMOS Accelerators to In-Memory-Computing

Flash Memory Array for Efficient Implementation of Deep Neural Networks

Highly Efficient Neuromorphic Computing Systems with Emerging Nonvolatile Memories

In-memory Computing with Emerging Nonvolatile Memory Devices

Neuromorphic Computing Systems with Emerging Nonvolatile Memories: A Circuits and Systems Perspective

Nonvolatile Memory Materials for Neuromorphic Intelligent Machines

Memristors -- from In-memory computing, Deep Learning Acceleration, Spiking Neural Networks, to the Future of Neuromorphic and Bio-inspired Computing

Data and Power Efficient Intelligence with Neuromorphic Learning Machines

Edge AI without Compromise: Efficient, Versatile and Accurate Neurocomputing in Resistive Random-Access Memory

Advances in Neuromorphic Computing: Expanding Horizons for AI Development through Novel Artificial Neurons and In-Sensor Computing

Towards Efficient Neural Networks On-a-chip: Joint Hardware-Algorithm Approaches

A Co-design view of Compute in-Memory with Non-Volatile Elements for Neural Networks

Emerging Memory Devices for Neuromorphic Computing

Hardware Implementation of Energy Efficient Deep Learning Neural Network Based on Nanoscale Flash Computing Array

Memristors—From In‐Memory Computing, Deep Learning Acceleration, and Spiking Neural Networks to the Future of Neuromorphic and Bio‐Inspired Computing

A Survey on Neuromorphic Architectures for Running Artificial Intelligence Algorithms