Abstract: The second-order training methods can converge much faster than first-order optimizers in DNN training. This is because the second-order training utilizes the inversion of the second-order information (SOI) matrix to find a more accurate descent direction and step size. However, the huge SOI matrices bring significant computational and memory overheads in the traditional architectures like GPU and CPU. On the other side, the ReRAM-based process-in-memory (PIM) technology is suitable for the second-order training because of the following three reasons: First, PIM's computation happens in memory, which reduces data movement overheads; Second, ReRAM crossbars can compute SOI's inversion in $O\left(1\right)$ time; Third, if architected properly, ReRAM crossbars can perform matrix inversion and vector-matrix multiplications which are important to the second-order training algorithms. Nevertheless, current ReRAM-based PIM techniques still face a key challenge for accelerating the second-order training. The existing ReRAM-based matrix inversion circuitry can only support 8-bit accuracy matrix inversion and the computational precision is not sufficient for the second-order training that needs at least 16-bit accurate matrix inversion. In this work, we propose a method to achieve high-precision matrix inversion based on a proven 8-bit matrix inversion (INV) circuitry and vector-matrix multiplication (VMM) circuitry. We design \archname{}, a ReRAM-based PIM accelerator architecture for the second-order training. Moreover, we propose a software mapping scheme for \archname{} to further optimize the performance by fusing VMM and INV crossbar. Experiment shows that \archname{} can achieve an average of 115.8$\times$/11.4$\times$ speedup and 41.9$\times$/12.8$\times$energy saving compared to a GPU counterpart and PipeLayer on large-scale DNNs.

ERA-LSTM: An Efficient ReRAM-Based Architecture for Long Short-Term Memory

Long Short-Term Memory Implementation Exploiting Passive RRAM Crossbar Array

AERIS: Area/Energy-Efficient lT2R ReRAM Based Processing-in-Memory Neural Network System-on-a-Chip

Long short-term memory networks in memristor crossbar arrays

SEAL-lab Technical Report – No . 2015-001 ( April 29 , 2016 ) Processing-in-Memory in ReRAM-based Main Memory

SEAL-lab Technical Report – No . 2015-001 ( November 30 , 2015 ) Processing-in-Memory in ReRAM-based Main Memory

AERIS - area/energy-efficient 1T2R ReRAM based processing-in-memory neural network system-on-a-chip.

Long short-term memory networks in memristor crossbars

ARAS: An Adaptive Low-Cost ReRAM-Based Accelerator for DNNs

SNrram: an Efficient Sparse Neural Network Computation Architecture Based on Resistive Random-Access Memory.

Neural-PIM: Efficient Processing-In-Memory with Neural Approximation of Peripherals

TIME: A Training-in-Memory Architecture for RRAM-Based Deep Neural Networks

An Energy Efficient Computing-in-Memory Accelerator With 1T2R Cell and Fully Analog Processing for Edge AI Applications

A Reduced Architecture for ReRAM-Based Neural Network Accelerator and Its Software Stack

FORMS: Fine-grained Polarized ReRAM-based In-situ Computation for Mixed-signal DNN Accelerator

RePAST: A ReRAM-based PIM Accelerator for Second-order Training of DNN

Crossbar-Constrained Technology Mapping for ReRAM Based In-Memory Computing

Graphsar: A Sparsity-Aware Processing-In-Memory Architecture For Large-Scale Graph Processing On Rerams

A Reconfigurable 4T2R ReRAM Computing In-Memory Macro for Efficient Edge Applications

A 28 nm 81 Kb 5995.3 TOPS/W 4T2R ReRAM Computing-in-Memory Accelerator With Voltage-to-Time-to-Digital Based Output

PRIME: A Novel Processing-in-Memory Architecture for Neural Network Computation in ReRAM-Based Main Memory.