Abstract:We propose a co-design approach for compute-in-memory inference for deep neural networks (DNN). We use multiplication-free function approximators based on $ell _{1}$ norm along with a co-adapted processing array and compute flow. Using the approach, we overcame many deficiencies in the current art of in-SRAM DNN processing such as the need for digital-to-analog converters (DACs) at each operating SRAM row/column, the need for high precision analog-to-digital converters (ADCs), limited support for multi-bit precision weights, and limited vector-scale parallelism. Our co-adapted implementation seamlessly extends to multi-bit precision weights, it doesn't require DACs, and it easily extends to higher vector-scale parallelism. We also propose an SRAM-immersed successive approximation ADC (SA-ADC), where we exploit the parasitic capacitance of bit lines of SRAM array as a capacitive DAC. Since the dominant area overhead in SA-ADC comes due to its capacitive DAC, by exploiting the intrinsic parasitic of SRAM array, our approach allows low area implementation of within-SRAM SA-ADC. Our $8times 62$ SRAM macro, which requires a 5-bit ADC, achieves ~105 tera operations per second per Watt (TOPS/W) with 8-bit input/weight processing at 45 nm CMOS. Our $8times 30$ SRAM macro, which requires a 4-bit ADC, achieves ~84 TOPS/W. SRAM macros that require lower ADC precision are more tolerant of process variability, however, have lower TOPS/W as well. We evaluated the accuracy and performance of our proposed network for MNIST, CIFAR10, and CIFAR100 datasets. We chose a network configuration which adaptively mixes multiplication-free and regular operators. The network configura-ions utilize the multiplication-free operator for more than 85% operations from the total. The selected configurations are 98.6% accurate for MNIST, 90.2% for CIFAR10, and 66.9% for CIFAR100. Since most of the operations in the considered configurations are based on proposed SRAM macros, our compute-in-memory's efficiency benefits broadly translate to the system-level.

Hybrid RRAM/SRAM in-Memory Computing for Robust DNN Acceleration

A Hybrid RRAM-SRAM Computing-In-Memory Architecture for Deep Neural Network Inference-Training Edge Acceleration

RRAM-DNN: an RRAM and Model-Compression Empowered All-Weights-On-Chip DNN Accelerator

SRAM-Based In-Memory Computing Macro Featuring Voltage-Mode Accumulator and Row-by-Row ADC for Processing Neural Networks

MF-Net: Compute-In-Memory SRAM for Multibit Precision Inference Using Memory-Immersed Data Conversion and Multiplication-Free Operators

TD-SRAM: Time-Domain-Based In-Memory Computing Macro for Binary Neural Networks

A Multiply-Less Approximate SRAM Compute-In-Memory Macro for Neural-Network Inference

High-Throughput In-Memory Computing for Binary Deep Neural Networks with Monolithically Integrated RRAM and 90nm CMOS

A Charge-Domain Scalable-Weight In-Memory Computing Macro With Dual-SRAM Architecture for Precision-Scalable DNN Accelerators

A 28-Nm RRAM Computing-in-Memory Macro Using Weighted Hybrid 2T1R Cell Array and Reference Subtracting Sense Amplifier for AI Edge Inference

A compute-in-memory chip based on resistive random-access memory

CAP-RAM: A Charge-Domain In-Memory Computing 6T-SRAM for Accurate and Precision-Programmable CNN Inference

An ADC-less RRAM-based Computing-in-Memory Macro with Binary CNN for Efficient Edge AI

CREAM: Computing in ReRAM-Assisted Energy- and Area-Efficient SRAM for Reliable Neural Network Acceleration.

Hdc-Im: Hyperdimensional Computing In-Memory Architecture Based On Rram

33.2 A Fully Integrated Analog ReRAM Based 78.4TOPS/W Compute-In-Memory Chip with Fully Parallel MAC Computing.

Hadamard Product-Based In-Memory Computing Design for Floating Point Neural Network Training

Training Neural Networks With In-Memory-Computing Hardware and Multi-Level Radix-4 Inputs

Compensation Architecture Design Utilizing Residual Resource to Mitigate Impacts of Nonidealities in RRAM-based Computing-in-memory Chips

Intra-array Non-Idealities Modeling and Algorithm Optimization for RRAM-based Computing-in-Memory Applications

Bit-Aware Fault-Tolerant Hybrid Retraining and Remapping Schemes for RRAM-Based Computing-in-Memory Systems