Abstract:Realizing today's cloud-level artificial intelligence functionalities directly on devices distributed at the edge of the internet calls for edge hardware capable of processing multiple modalities of sensory data (e.g. video, audio) at unprecedented energy-efficiency. AI hardware architectures today cannot meet the demand due to a fundamental "memory wall": data movement between separate compute and memory units consumes large energy and incurs long latency. Resistive random-access memory (RRAM) based compute-in-memory (CIM) architectures promise to bring orders of magnitude energy-efficiency improvement by performing computation directly within memory. However, conventional approaches to CIM hardware design limit its functional flexibility necessary for processing diverse AI workloads, and must overcome hardware imperfections that degrade inference accuracy. Such trade-offs between efficiency, versatility and accuracy cannot be addressed by isolated improvements on any single level of the design. By co-optimizing across all hierarchies of the design from algorithms and architecture to circuits and devices, we present NeuRRAM - the first multimodal edge AI chip using RRAM CIM to simultaneously deliver a high degree of versatility for diverse model architectures, record energy-efficiency $5\times$ - $8\times$ better than prior art across various computational bit-precisions, and inference accuracy comparable to software models with 4-bit weights on all measured standard AI benchmarks including accuracy of 99.0% on MNIST and 85.7% on CIFAR-10 image classification, 84.7% accuracy on Google speech command recognition, and a 70% reduction in image reconstruction error on a Bayesian image recovery task. This work paves a way towards building highly efficient and reconfigurable edge AI hardware platforms for the more demanding and heterogeneous AI applications of the future.

3U-EdgeAI: Ultra-Low Memory Training, Ultra-Low BitwidthQuantization, and Ultra-Low Latency Acceleration

Condense: A Framework for Device and Frequency Adaptive Neural Network Models on the Edge.

A fine-grained mixed precision DNN accelerator using a two-stage big-little core RISC-V MCU.

Low-Power Ultra-Small Edge AI Accelerators for Image Recognition with Convolution Neural Networks: Analysis and Future Directions

Understanding Sensor Data Using Deep Learning Methods on Resource-Constrained Edge Devices.

Efficient Hardware Optimization Strategies For Deep Neural Networks Acceleration Chip

AccEPT: an Acceleration Scheme for Speeding Up Edge Pipeline-parallel Training

Network-on-Chip-Centric Accelerator Architectures for Edge AI Computing

Edge AI: On-Demand Accelerating Deep Neural Network Inference via Edge Computing

Enabling Deep Learning on Edge Devices

An Overview of Energy-Efficient Hardware Accelerators for On-Device Deep-Neural-Network Training

AI on the Edge: Rethinking AI-based IoT Applications Using Specialized Edge Architectures

Memory Relevant Hyperparameters Optimization for DNN Training at Edge

Exploring In-Memory Accelerators and FPGAs for Latency-Sensitive DNN Inference on Edge Servers

Low- and Mixed-Precision Inference Accelerators

Research on Convolutional Neural Network Inference Acceleration and Performance Optimization for Edge Intelligence

A Review of Convolutional Neural Networks Hardware Accelerators for AIoT Edge Computing

An Efficient Hardware Architecture for DNN Training by Exploiting Triple Sparsity

Edge Intelligence: On-Demand Deep Learning Model Co-Inference with Device-Edge Synergy

Edge AI without Compromise: Efficient, Versatile and Accurate Neurocomputing in Resistive Random-Access Memory