Abstract:The record-breaking performance of deep neural networks (DNNs) comes with heavy parameter budgets, which leads to external dynamic random access memory (DRAM) for storage. The prohibitive energy of DRAM accesses makes it nontrivial for DNN deployment on resource-constrained devices, calling for minimizing the movements of weights and data in order to improve the energy efficiency. Driven by this critical bottleneck, we present SmartDeal, a hardware-friendly algorithm framework to trade higher-cost memory storage/access for lower-cost computation, in order to aggressively boost the storage and energy efficiency, for both DNN inference and training. The core technique of SmartDeal is a novel DNN weight matrix decomposition framework with respective structural constraints on each matrix factor, carefully crafted to unleash the hardware-aware efficiency potential. Specifically, we decompose each weight tensor as the product of a small basis matrix and a large structurally sparse coefficient matrix whose nonzero elements are readily quantized to the power-of-2. The resulting sparse and readily quantized DNNs enjoy greatly reduced energy consumption in data movement as well as weight storage, while incurring minimal overhead to recover the original weights thanks to the required sparse bit-operations and cost-favorable computations. Beyond inference, we take another leap to embrace energy-efficient training, by introducing several customized techniques to address the unique roadblocks arising in training while preserving the SmartDeal structures. We also design a dedicated hardware accelerator to fully utilize the new weight structure to improve the real energy efficiency and latency performance. We conduct experiments on both vision and language tasks, with nine models, four datasets, and three settings (inference-only, adaptation, and fine-tuning). Our extensive results show that 1) being applied to inference, SmartDeal achieves up to 2.44× improvement in energy efficiency as evaluated using real hardware implementations and 2) being applied to training, SmartDeal can lead to 10.56× and 4.48× reduction in the storage and the training energy cost, respectively, with usually negligible accuracy loss, compared to state-of-the-art training baselines. Our source codes are available at: https://github.com/VITA-Group/SmartDeal.

BEM: Bit-level Sparsity-aware Deep Learning Accelerator with Efficient Booth Encoding and Weight Multiplexing

Bit-Offsetter: A Bit-serial DNN Accelerator with Weight-offset MAC for Bit-wise Sparsity Exploitation

Bit-balance: Model-Hardware Co-design for Accelerating NNs by Exploiting Bit-level Sparsity

Exploiting Bit Sparsity in Both Activation and Weight in Neural Networks Accelerators

Elastic Significant Bit Quantization and Acceleration for Deep Neural Networks

BBS: Bi-directional Bit-level Sparsity for Deep Learning Acceleration

EBSP: evolving bit sparsity patterns for hardware-friendly inference of quantized deep neural networks

Woodpecker-DL: Accelerating Deep Neural Networks via Hardware-Aware Multifaceted Optimizations

A Low-Power Sparse Convolutional Neural Network Accelerator with Pre-Encoding Radix-4 Booth Multiplier

A Precision-Scalable Deep Neural Network Accelerator with Activation Sparsity Exploitation

SmartDeal: Remodeling Deep Network Weights for Efficient Inference and Training

An Efficient Hardware Architecture for DNN Training by Exploiting Triple Sparsity

An Energy-Efficient Bagged Binary Neural Network Accelerator

Exploiting Dynamic Bit Sparsity in Activation for Deep Neural Network Acceleration.

Weight Block Sparsity: Training, Compilation, and AI Engine Accelerators

UNPU: A 50.6TOPS/W unified deep neural network accelerator with 1b-to-16b fully-variable weight bit-precision

SPARK: Scalable and Precision-Aware Acceleration of Neural Networks Via Efficient Encoding

Accelerated Inference Framework of Sparse Neural Network Based on Nested Bitmask Structure.

Accelerating Deep Neural Networks by Combining Block-Circulant Matrices and Low-Precision Weights

Energy-Efficient Architecture for FPGA-based Deep Convolutional Neural Networks with Binary Weights

EncodingNet: A Novel Encoding-based MAC Design for Efficient Neural Network Acceleration