A 119.64 GOPs/W FPGA-Based ResNet50 Mixed-Precision Accelerator Using the Dynamic DSP Packing

Yaozhong Ou,Wei-Han Yu,Ka-Fai Un,Chi-Hang Chan,Yan Zhu
DOI: https://doi.org/10.1109/tcsii.2024.3377356
2024-01-01
Abstract:This paper presents a precision-sensitivity-aware quantization (PSAQ) mixed precision (MP) compression scheme designed for both weights and activations. The PSAQ MP method achieves a better trade-off between accuracy and energy efficiency, maintaining 75.6% top-1 accuracy in ResNet-50 and achieving 2.06× reduction in normalized operation with less than 1% accuracy difference compared to baseline. We propose two DSP-pipeline-friendly methods, dynamic DSP packing (DDP) and fully pre-calibrated (FPC) unpacking, to pack multiple operations into single DSP in error-free style with only one more clock cycle and slight logic overhead compared to the one without packing, by which the accelerator can simultaneously address the support for MP algorithms and efficient utilization of DSP bandwidth. Cooperated by the router network and optimized dataflow, our MP accelerator achieves 330.15 GOP/s throughput and 119.64 GOPs/W energy efficiency under 2.27-b weight and 3.61-b input feature map (ifmap).
engineering, electrical & electronic
What problem does this paper attempt to address?