Abstract:In response to the escalating demand for hardware-efficient Deep Neural Network (DNN) architectures, we present a novel quantize-enabled multiply-accumulate (MAC) unit. Our methodology employs a right shift-and-add computation for MAC operation, enabling runtime truncation without additional hardware. This architecture optimally utilizes hardware resources, enhancing throughput performance while reducing computational complexity through bit-truncation techniques. Our key methodology involves designing a hardware-efficient MAC computational algorithm that supports both iterative and pipeline implementations, catering to diverse hardware efficiency or enhanced throughput requirements in accelerators. Additionally, we introduce a processing element (PE) with a pre-loading bias scheme, reducing one clock delay and eliminating the need for conventional extra resources in PE implementation. The PE facilitates quantization-based MAC calculations through an efficient bit-truncation method, removing the necessity for extra hardware logic. This versatile PE accommodates variable bit-precision with a dynamic fraction part within the sfxpt< N,f representation, meeting specific model or layer demands. Through software emulation, our proposed approach demonstrates minimal accuracy loss, revealing under 1.6% loss for LeNet-5 using MNIST and around 4% for ResNet-18 and VGG-16 with CIFAR-10 in the sfxpt< 8 ,5 format compared to conventional float32-based implementations. Hardware performance parameters on the Xilinx-Virtex-7 board unveil a 37% reduction in area utilization and a 45% reduction in power consumption compared to the best state-of-the-art MAC architecture. Extending the proposed MAC to a LeNet DNN model results in a 42% reduction in resource requirements and a significant 27% reduction in delay. This architecture provides notable advantages for resource-efficient, high-throughput edge-AI applications.

Nanoscale Design of Multi-Layer Perceptrons Using Floating-Point Arithmetic Units

A Near Memory Computing FPGA Architecture for Neural Network Acceleration

ASIC Design of Nanoscale Artificial Neural Networks for Inference/Training by Floating-Point Arithmetic

Floating-Point Formats and Arithmetic for Highly Accurate Multi-Layer Perceptrons

DaDianNao: A Machine-Learning Supercomputer

Nanoscale Accelerators for Artificial Neural Networks

A fine-grained mixed precision DNN accelerator using a two-stage big-little core RISC-V MCU.

Optimal Architecture of Floating-Point Arithmetic for Neural Network Training Processors

A Multi-level Parallel Integer/Floating-Point Arithmetic Architecture for Deep Learning Instructions.

DeepBurning-MixQ: An Open Source Mixed-Precision Neural Network Accelerator Design Framework for FPGAs

A Logarithmic Floating-Point Multiplier for the Efficient Training of Neural Networks

FantastIC4: A Hardware-Software Co-Design Approach for Efficiently Running 4bit-Compact Multilayer Perceptrons

PIR-DSP: an FPGA DSP Block Architecture for Multi-precision Deep Neural Networks

Accelerating Deep Neural Networks by Combining Block-Circulant Matrices and Low-Precision Weights

A Power Efficient Neural Network Implementation on Heterogeneous FPGA and GPU Devices

Logic Design of Neural Networks for High-Throughput and Low-Power Applications

A Fine-Grained Sparse Accelerator for Multi-Precision DNN.

Training Deep Neural Networks with 8-bit Floating Point Numbers

QuantMAC: Enhancing Hardware Performance in DNNs With Quantize Enabled Multiply-Accumulate Unit

A Reconfigurable Processing Element for Multiple-Precision Floating/Fixed-Point HPC

Optimizing FPGA-Based DNN Accelerator with Shared Exponential Floating-Point Format