Manuel Le Gallo,Riduan Khaddam-Aljameh,Milos Stanisavljevic,Athanasios Vasilopoulos,Benedikt Kersting,Martino Dazzi,Geethan Karunaratne,Matthias Braendli,Abhairaj Singh,Silvia M. Mueller,Julian Buechel,Xavier Timoneda,Vinay Joshi,Urs Egger,Angelo Garofalo,Anastasios Petropoulos,Theodore Antonakopoulos,Kevin Brew,Samuel Choi,Injo Ok,Timothy Philip,Victor Chan,Claire Silvestre,Ishtiaq Ahsan,Nicole Saulnier,Vijay Narayanan,Pier Andrea Francese,Evangelos Eleftheriou,Abu Sebastian

Abstract:The need to repeatedly shuttle around synaptic weight values from memory to processing units has been a key source of energy inefficiency associated with hardware implementation of artificial neural networks. Analog in-memory computing (AIMC) with spatially instantiated synaptic weights holds high promise to overcome this challenge, by performing matrix-vector multiplications (MVMs) directly within the network weights stored on a chip to execute an inference workload. However, to achieve end-to-end improvements in latency and energy consumption, AIMC must be combined with on-chip digital operations and communication to move towards configurations in which a full inference workload is realized entirely on-chip. Moreover, it is highly desirable to achieve high MVM and inference accuracy without application-wise re-tuning of the chip. Here, we present a multi-core AIMC chip designed and fabricated in 14-nm complementary metal-oxide-semiconductor (CMOS) technology with backend-integrated phase-change memory (PCM). The fully-integrated chip features 64 256x256 AIMC cores interconnected via an on-chip communication network. It also implements the digital activation functions and processing involved in ResNet convolutional neural networks and long short-term memory (LSTM) networks. We demonstrate near software-equivalent inference accuracy with ResNet and LSTM networks while implementing all the computations associated with the weight layers and the activation functions on-chip. The chip can achieve a maximal throughput of 63.1 TOPS at an energy efficiency of 9.76 TOPS/W for 8-bit input/output matrix-vector multiplications.

Neural Networks on Chip: from CMOS Accelerators to In-Memory-Computing

A Near Memory Computing FPGA Architecture for Neural Network Acceleration

Neural Network Acceleration and Voice Recognition with a Flash-based In-Memory Computing SoC

DaDianNao: A Machine-Learning Supercomputer

Special Topic on Nonvolatile Memory for Efficient Implementation of Neural/Neuromorphic Computing

Flash Memory Array for Efficient Implementation of Deep Neural Networks

Accelerating Neural Network Inference with Processing-in-DRAM: From the Edge to the Cloud

Computing Utilization Enhancement for Chiplet-based Homogeneous Processing-in-Memory Deep Learning Processors

Floating Gate Transistor‐Based Accurate Digital In‐Memory Computing for Deep Neural Networks

A 64-core mixed-signal in-memory compute chip based on phase-change memory for deep neural network inference

In-memory Computing with Emerging Nonvolatile Memory Devices

In-Memory Computing: The Next-Generation AI Computing Paradigm

A Co-design view of Compute in-Memory with Non-Volatile Elements for Neural Networks

Hardware Implementation of Energy Efficient Deep Learning Neural Network Based on Nanoscale Flash Computing Array

On Designing Efficient and Reliable Nonvolatile Memory-Based Computing-In-Memory Accelerators

Architectural Implications of Neural Network Inference for High Data-Rate, Low-Latency Scientific Applications

Towards Efficient Neural Networks On-a-chip: Joint Hardware-Algorithm Approaches

A NoC-Based Spatial DNN Inference Accelerator with Memory-Friendly Dataflow

Breaking the Memory Wall for AI Chip with a New Dimension

A Low-Latency DNN Accelerator Enabled by DFT-Based Convolution Execution Within Crossbar Arrays

High-Throughput In-Memory Computing for Binary Deep Neural Networks with Monolithically Integrated RRAM and 90nm CMOS