Abstract:We propose a co-design approach for compute-in-memory inference for deep neural networks (DNN). We use multiplication-free function approximators based on $ell _{1}$ norm along with a co-adapted processing array and compute flow. Using the approach, we overcame many deficiencies in the current art of in-SRAM DNN processing such as the need for digital-to-analog converters (DACs) at each operating SRAM row/column, the need for high precision analog-to-digital converters (ADCs), limited support for multi-bit precision weights, and limited vector-scale parallelism. Our co-adapted implementation seamlessly extends to multi-bit precision weights, it doesn't require DACs, and it easily extends to higher vector-scale parallelism. We also propose an SRAM-immersed successive approximation ADC (SA-ADC), where we exploit the parasitic capacitance of bit lines of SRAM array as a capacitive DAC. Since the dominant area overhead in SA-ADC comes due to its capacitive DAC, by exploiting the intrinsic parasitic of SRAM array, our approach allows low area implementation of within-SRAM SA-ADC. Our $8times 62$ SRAM macro, which requires a 5-bit ADC, achieves ~105 tera operations per second per Watt (TOPS/W) with 8-bit input/weight processing at 45 nm CMOS. Our $8times 30$ SRAM macro, which requires a 4-bit ADC, achieves ~84 TOPS/W. SRAM macros that require lower ADC precision are more tolerant of process variability, however, have lower TOPS/W as well. We evaluated the accuracy and performance of our proposed network for MNIST, CIFAR10, and CIFAR100 datasets. We chose a network configuration which adaptively mixes multiplication-free and regular operators. The network configura-ions utilize the multiplication-free operator for more than 85% operations from the total. The selected configurations are 98.6% accurate for MNIST, 90.2% for CIFAR10, and 66.9% for CIFAR100. Since most of the operations in the considered configurations are based on proposed SRAM macros, our compute-in-memory's efficiency benefits broadly translate to the system-level.

A Multiply-Less Approximate SRAM Compute-In-Memory Macro for Neural-Network Inference

A Low-Power In-Memory Multiplication and Accumulation Array with Modified Radix-4 Input and Canonical Signed Digit Weights

A Robust 8-Bit Non-Volatile Computing-in-Memory Core for Low-Power Parallel MAC Operations.

An 8-Bit in Resistive Memory Computing Core with Regulated Passive Neuron and Bitline Weight Mapping

7.3 A 28nm 38-to-102-tops/w 8b Multiply-Less Approximate Digital SRAM Compute-In-Memory Macro for Neural-Network Inference

A 1–8b Reconfigurable Digital SRAM Compute-in-Memory Macro for Processing Neural Networks

24.5 A Twin-8T SRAM Computation-In-Memory Macro for Multiple-Bit CNN-Based Machine Learning

MF-Net: Compute-In-Memory SRAM for Multibit Precision Inference Using Memory-Immersed Data Conversion and Multiplication-Free Operators

34.3 A 22nm 64kb Lightning-Like Hybrid Computing-in-Memory Macro with a Compressed Adder Tree and Analog-Storage Quantizers for Transformer and CNNs.

A 28nm 8Kb Reconfigurable SRAM Computing-In-Memory Macro With Input-Sparsity Optimized DTC for Multi-mode MAC Operations

A Digital SRAM Computing-in-Memory Design Utilizing Activation Unstructured Sparsity for High-Efficient DNN Inference

A 28-Nm RRAM Computing-in-Memory Macro Using Weighted Hybrid 2T1R Cell Array and Reference Subtracting Sense Amplifier for AI Edge Inference

A High-Density and Reconfigurable SRAM-Based Digital Compute-In-Memory Macro for Low-Power AI Chips.

A 28nm 32kb SRAM Computing-in-Memory Macro with Hierarchical Capacity Attenuator and Input Sparsity-Optimized ADC for 4b Mac Operation

A 65 Nm 73 Kb SRAM-Based Computing-In-Memory Macro with Dynamic-Sparsity Controlling

33.2 A Fully Integrated Analog ReRAM Based 78.4TOPS/W Compute-In-Memory Chip with Fully Parallel MAC Computing.

A Twin-8T SRAM Computation-in-Memory Unit-Macro for Multibit CNN-Based AI Edge Processors

An XOR-10T SRAM computing-in-memory macro with current MAC operations and time-to-digital conversion for BNN edge processors

An RRAM-Based Digital Computing-in-Memory Macro with Dynamic Voltage Sense Amplifier and Sparse-Aware Approximate Adder Tree

DS-CIM: A 40nm Asynchronous Dual-Spike Driven, MRAM Compute-In-Memory Macro for Spiking Neural Network

A 128 Kb DAC-less 6T SRAM computing-in-memory macro with prioritized subranging ADC for AI edge applications