Abstract:Bayesian Neural Networks (BNNs) provide superior estimates of uncertainty by generating an ensemble of predictive distributions. However, inference via ensembling is resource-intensive, requiring additional entropy sources to generate stochasticity which increases resource consumption. We introduce Bayes2IMC, an in-memory computing (IMC) architecture designed for binary Bayesian neural networks that leverage nanoscale device stochasticity to generate desired distributions. Our novel approach utilizes Phase-Change Memory (PCM) to harness inherent noise characteristics, enabling the creation of a binary neural network. This design eliminates the necessity for a pre-neuron Analog-to-Digital Converter (ADC), significantly improving power and area efficiency. We also develop a hardware-software co-optimized correction method applied solely on the logits in the final layer to reduce device-induced accuracy variations across deployments on hardware. Additionally, we devise a simple compensation technique that ensures no drop in classification accuracy despite conductance drift of PCM. We validate the effectiveness of our approach on the CIFAR-10 dataset with a VGGBinaryConnect model, achieving accuracy metrics comparable to ideal software implementations as well as results reported in the literature using other technologies. Finally, we present a complete core architecture and compare its projected power, performance, and area efficiency against an equivalent SRAM baseline, showing a $3.8$ to $9.6 \times$ improvement in total efficiency (in GOPS/W/mm$^2$) and a $2.2 $ to $5.6 \times$ improvement in power efficiency (in GOPS/W). In addition, the projected hardware performance of Bayes2IMC surpasses that of most of the BNN architectures based on memristive devices reported in the literature, and achieves up to $20\%$ higher power efficiency compared to the state-of-the-art.

Enabling High-Quality Uncertainty Quantification in a PIM Designed for Bayesian Neural Network

PIM-QAT: Neural Network Quantization for Processing-In-Memory (PIM) Systems

A noise-tolerant, resource-saving probabilistic binary neural network implemented by the SOT-MRAM compute-in-memory system

Block-Wise Mixed-Precision Quantization: Enabling High Efficiency for Practical ReRAM-based DNN Accelerators

PIMulator-NN: an Event-Driven, Cross-level Simulation Framework for Processing-In-Memory Based Neural Network Accelerators

SOT-MRAM-Enabled Probabilistic Binary Neural Networks for Noise-Tolerant and Fast Training

Reliability-Aware Training and Performance Modeling for Processing-In-Memory Systems

A principled distance-aware uncertainty quantification approach for enhancing the reliability of physics-informed neural network

Bayes2IMC: In-Memory Computing for Bayesian Binary Neural Networks

Neural-PIM: Efficient Processing-In-Memory with Neural Approximation of Peripherals

A Collaborative PIM Computing Optimization Framework for Multi-Tenant DNN

SEAL-lab Technical Report – No . 2015-001 ( April 29 , 2016 ) Processing-in-Memory in ReRAM-based Main Memory

SEAL-lab Technical Report – No . 2015-001 ( November 30 , 2015 ) Processing-in-Memory in ReRAM-based Main Memory

ReHy: A ReRAM-based Digital/Analog Hybrid PIM Architecture for Accelerating CNN Training

MNSIM 2.0: A Behavior-Level Modeling Tool for Processing-In-Memory Architectures.

APIM: An Antiferromagnetic MRAM-Based Processing-In-Memory System for Efficient Bit-level Operations of Quantized Convolutional Neural Networks

Uncertainty quantification via a memristor Bayesian deep neural network for risk-sensitive reinforcement learning

An Improved RRAM-Based Binarized Neural Network with High Variation-Tolerated Forward/Backward Propagation Module

Mixed Precision Quantization for ReRAM-based DNN Inference Accelerators

VECOM: Variation Resilient Encoding and Offset Compensation Schemes for Reliable ReRAM Based DNN Accelerator

33.1 A 74 TMACS/W CMOS-RRAM Neurosynaptic Core with Dynamically Reconfigurable Dataflow and In-situ Transposable Weights for Probabilistic Graphical Models.