Abstract: The second-order training methods can converge much faster than first-order optimizers in DNN training. This is because the second-order training utilizes the inversion of the second-order information (SOI) matrix to find a more accurate descent direction and step size. However, the huge SOI matrices bring significant computational and memory overheads in the traditional architectures like GPU and CPU. On the other side, the ReRAM-based process-in-memory (PIM) technology is suitable for the second-order training because of the following three reasons: First, PIM's computation happens in memory, which reduces data movement overheads; Second, ReRAM crossbars can compute SOI's inversion in $O\left(1\right)$ time; Third, if architected properly, ReRAM crossbars can perform matrix inversion and vector-matrix multiplications which are important to the second-order training algorithms. Nevertheless, current ReRAM-based PIM techniques still face a key challenge for accelerating the second-order training. The existing ReRAM-based matrix inversion circuitry can only support 8-bit accuracy matrix inversion and the computational precision is not sufficient for the second-order training that needs at least 16-bit accurate matrix inversion. In this work, we propose a method to achieve high-precision matrix inversion based on a proven 8-bit matrix inversion (INV) circuitry and vector-matrix multiplication (VMM) circuitry. We design \archname{}, a ReRAM-based PIM accelerator architecture for the second-order training. Moreover, we propose a software mapping scheme for \archname{} to further optimize the performance by fusing VMM and INV crossbar. Experiment shows that \archname{} can achieve an average of 115.8$\times$/11.4$\times$ speedup and 41.9$\times$/12.8$\times$energy saving compared to a GPU counterpart and PipeLayer on large-scale DNNs.

ARCHER: a ReRAM-based Accelerator for Compressed Recommendation Systems

PIM-DH: Re RAM-based Processing-in-Memory Architecture for Deep Hashing Acceleration

Rerec: In-ReRAM Acceleration with Access-Aware Mapping for Personalized Recommendation

RecNMP: Accelerating Personalized Recommendation with Near-Memory Processing

ARAS: An Adaptive Low-Cost ReRAM-Based Accelerator for DNNs

Pointer: An Energy-Efficient ReRAM-based Point Cloud Recognition Accelerator with Inter-layer and Intra-layer Optimizations

A Flexible Embedding-Aware Near Memory Processing Architecture for Recommendation System

GraphR: Accelerating Graph Processing Using ReRAM

MicroRec: Efficient Recommendation Inference by Hardware and Data Structure Solutions

iMARS: An In-Memory-Computing Architecture for Recommendation Systems

HURRY: Highly Utilized, Reconfigurable ReRAM-based In-situ Accelerator with Multifunctionality

A Reduced Architecture for ReRAM-Based Neural Network Accelerator and Its Software Stack

UpDLRM: Accelerating Personalized Recommendation using Real-World PIM Architecture

Graphsar: A Sparsity-Aware Processing-In-Memory Architecture For Large-Scale Graph Processing On Rerams

ERA-LSTM: An Efficient ReRAM-Based Architecture for Long Short-Term Memory

RePAST: A ReRAM-based PIM Accelerator for Second-order Training of DNN

RIMAC: an Array-Level ADC/DAC-Free ReRAM-Based In-Memory DNN Processor with Analog Cache and Computation.

FPSA: A Full System Stack Solution for Reconfigurable ReRAM-based NN Accelerator Architecture

NDRec: A Near-Data Processing System for Training Large-Scale Recommendation Models

AERIS: Area/Energy-Efficient lT2R ReRAM Based Processing-in-Memory Neural Network System-on-a-Chip

AUTOHET: an Automated Heterogeneous ReRAM-Based Accelerator for DNN Inference