Abstract:The energy consumption of training and deploying state-of-the-art artificial intelligence (AI) models has experienced exponential growth, driven by increasing model parameters and the necessary training data. Recent models demand over 1 billion PetaFLOPs of total computation for training, potentially taking weeks to complete. With current GPU performance at approximately 1-3 TOPS/W, GPU training energy consumption alone can exceed 100,000 kWh, equivalent to the monthly energy expenditure of 100 US households. Over the next decade, these models are expected to scale up further, driving total computing energy to constitute a significant portion of global consumption. In this talk, we describe both synaptic and neuronal devices that can accelerate AI algorithms with potentially multiple orders of magnitude improvement in power efficiency. First, we describe an oscillatory retinal neuron (ORN) that directly converts incident DC light into voltage spikes. Coupled arrays of this device result in an imager that carries out in-sensor processing while capturing an image. Uniquely, the conversion from input light to output voltage spikes occurs without external power. When coupled in arrays, the neighboring neurons interact with each other to influence the spiking frequency spectrum. This allows the arrays to carry out frequency multiplexed computation on an input image. It is shown that this approach can achieve>20,000 TOPS/W, multiple orders of magnitude greater than the current approaches. Theory and simulation is used to elucidate how coupled ORNs carry out computation on an input image. When coupled, each output frequency band encodes a unique computation on the input image. By tuning the coupling impedances and the frequency bands, user defined computations can be carried out on the input. We experimentally show that this can be carried out with a 3x3 array that demonstrates simultaneous edge detection, intensity filtering, image segmentation and other functions. This hardware is then used to demonstrate improvement in MNIST handwritten digit classification accuracy over a traditional imager connected to a fully connected network. Next, we describe spiking synaptic devices that can be directly fabricated in the back-end of line of CMOS devices. These devices consist of an InP transistor channel with an engineered gate stack. Using a uniform gate insulator, we can demonstrate behaviors of biological synapses, such as potentiation, depression, spike number dependent plasticity, and spike timing dependent plasticity. By introducing a heterostructured gate insulator, it is shown that short-term to long-term memory transitions can be designed into the device. Finally, by using a transparent gate, an in-sensor synaptic phototransistor is demonstrated and the performance of these devices at a system level is demonstrated.

Hardware Implementation of Energy Efficient Deep Learning Neural Network Based on Nanoscale Flash Computing Array

DaDianNao: A Machine-Learning Supercomputer

Analog Deep Neural Network Based On Nor Flash Computing Array For High Speed/Energy Efficiency Computation

Flash Memory Array for Efficient Implementation of Deep Neural Networks

Neural Synaptic Plasticity-Inspired Computing: A High Computing Efficient Deep Convolutional Neural Network Accelerator

A 3.89-Gops/mw Scalable Recurrent Neural Network Processor with Improved Efficiency on Memory and Computation

Towards Ultra-High Performance and Energy Efficiency of Deep Learning Systems: An Algorithm-Hardware Co-Optimization Framework

Floating Gate Transistor‐Based Accurate Digital In‐Memory Computing for Deep Neural Networks

Efficient Hardware Optimization Strategies For Deep Neural Networks Acceleration Chip

An Energy-Efficient Deep Belief Network Processor Based on Heterogeneous Multi-Core Architecture With Transposable Memory and On-Chip Learning

Special Topic on Nonvolatile Memory for Efficient Implementation of Neural/Neuromorphic Computing

A Power Efficient Neural Network Implementation on Heterogeneous FPGA and GPU Devices

Neural Networks on Chip: from CMOS Accelerators to In-Memory-Computing

A Scatter-and-Gather Spiking Convolutional Neural Network on a Reconfigurable Neuromorphic Hardware

An Energy-Efficient Convolutional Neural Network Processor Architecture Based on a Systolic Array

Optimized operation scheme of flash-memory-based neural network online training with ultra-high endurance

(Invited) Nanoscale Devices for Accelerating AI Algorithms

Minimizing Area and Energy of Deep Learning Hardware Design Using Collective Low Precision and Structured Compression

Efficient Discrete Temporal Coding Spike-Driven In-Memory Computing Macro for Deep Neural Network Based on Nonvolatile Memory.

A High Energy-Efficiency Multi-core Neuromorphic Architecture for Deep SNN Training

Flash-based Computing In-Memory Scheme for IOT.