Abstract:We propose a co-design approach for compute-in-memory inference for deep neural networks (DNN). We use multiplication-free function approximators based on $ell _{1}$ norm along with a co-adapted processing array and compute flow. Using the approach, we overcame many deficiencies in the current art of in-SRAM DNN processing such as the need for digital-to-analog converters (DACs) at each operating SRAM row/column, the need for high precision analog-to-digital converters (ADCs), limited support for multi-bit precision weights, and limited vector-scale parallelism. Our co-adapted implementation seamlessly extends to multi-bit precision weights, it doesn't require DACs, and it easily extends to higher vector-scale parallelism. We also propose an SRAM-immersed successive approximation ADC (SA-ADC), where we exploit the parasitic capacitance of bit lines of SRAM array as a capacitive DAC. Since the dominant area overhead in SA-ADC comes due to its capacitive DAC, by exploiting the intrinsic parasitic of SRAM array, our approach allows low area implementation of within-SRAM SA-ADC. Our $8times 62$ SRAM macro, which requires a 5-bit ADC, achieves ~105 tera operations per second per Watt (TOPS/W) with 8-bit input/weight processing at 45 nm CMOS. Our $8times 30$ SRAM macro, which requires a 4-bit ADC, achieves ~84 TOPS/W. SRAM macros that require lower ADC precision are more tolerant of process variability, however, have lower TOPS/W as well. We evaluated the accuracy and performance of our proposed network for MNIST, CIFAR10, and CIFAR100 datasets. We chose a network configuration which adaptively mixes multiplication-free and regular operators. The network configura-ions utilize the multiplication-free operator for more than 85% operations from the total. The selected configurations are 98.6% accurate for MNIST, 90.2% for CIFAR10, and 66.9% for CIFAR100. Since most of the operations in the considered configurations are based on proposed SRAM macros, our compute-in-memory's efficiency benefits broadly translate to the system-level.

Parallelizing SRAM Arrays with Customized Bit-Cell for Binary Neural Networks

Binary Neural Network with 16 Mb Rram Macro Chip for Classification and Online Training

Area and Energy Efficient Short-Circuit-Logic-Based STT-MRAM Crossbar Array for Binary Neural Networks

High-Throughput In-Memory Computing for Binary Deep Neural Networks with Monolithically Integrated RRAM and 90nm CMOS

Xcel-RAM: Accelerating Binary Neural Networks in High-Throughput SRAM Compute Arrays

SOT-MRAM-Based Design for Energy-Efficient and Reliable Binary Neural Network Acceleration

Binary Convolutional Neural Network on RRAM.

Hardware Implementation Of Rram Based Binarized Neural Networks

A Dual-Split 6T SRAM-Based Computing-in-Memory Unit-Macro With Fully Parallel Product-Sum Operation for Binarized DNN Edge Processors

An Improved RRAM-Based Binarized Neural Network with High Variation-Tolerated Forward/Backward Propagation Module

HXNOR-PBNN: A Scalable and Parallel Spintronics Synaptic Architecture for Probabilistic Binary Neural Networks

PXNOR-BNN: In/With Spin-Orbit Torque MRAM Preset-XNOR Operation-Based Binary Neural Networks.

XNOR-SRAM: In-Memory Computing SRAM Macro for Binary/Ternary Deep Neural Networks

Area Efficient Pattern Representation of Binary Neural Networks on RRAM

A 65nm 4kb Algorithm-Dependent Computing-in-Memory SRAM Unit-Macro with 2.3ns and 55.8TOPS/W Fully Parallel Product-Sum Operation for Binary DNN Edge Processors

HSC: A Hybrid Spin/CMOS Logic Based In-Memory Engine with Area-Efficient Mapping Strategy.

A Variation-Aware Binary Neural Network Framework for Process Resilient In-Memory Computations

CiM-BNN:Computing-in-MRAM Architecture for Stochastic Computing Based Bayesian Neural Network

MF-Net: Compute-In-Memory SRAM for Multibit Precision Inference Using Memory-Immersed Data Conversion and Multiplication-Free Operators

Neural Network-Hardware Co-design for Scalable RRAM-based BNN Accelerators

An Energy-Efficient In-Memory BNN Architecture with Time-Domain Analog and Digital Mixed-Signal Processing.