Abstract:Block-circulant matrix (BCM) compression has garnered much attention in the hardware acceleration of convolutional neural networks (CNNs) due to its regularity and efficiency. However, constrained by the difficulty of exploring the compression parameter space, existing BCM-based methods often apply a uniform compression parameter to all CNN models’ layers, losing the compression’s flexibility. Additionally, independently optimizing models or accelerators makes achieving the optimal tradeoff between model accuracy and hardware efficiency challenging. To this end, we propose FlexBCM, a joint exploration framework that efficiently explores both the parameter compression and hardware parameter space to generate customized hybrid BCM-compressed CNN and field-programmable gate array (FPGA) accelerator solutions. On the algorithmic side, leveraging the idea of neural architecture search (NAS), we design an efficient differentiable sampling method to rapidly evaluate the accuracy of candidate subnets. Additionally, we devise a hardware-friendly frequency domain quantization scheme for BCM computation. On the hardware side, we develop the efficient and parameter-configurable convolutional core (ConvPU) alongside the BCM computing core (BCMPU). The BCMPU can flexibly accommodate different compression parameters at runtime, incorporate complex-number DSP packing and conjugate symmetry optimizations. For model-to-hardware evaluation, we construct accurate latency and resource consumption models. Moreover, we design a fast hardware generation algorithm based on the coarse-grained search to provide prompt feedback on the hardware evaluation of the current subnet. Finally, we validate FlexBCM on the Xilinx ZCU102 FPGA and compare its compressed CNN-accelerator solutions with previous state-of-the-art works. Experimental results demonstrate that FlexBCM achieves 1.21–3.02 times higher-computational efficiency for ResNet18 and ResNet34 models while maintaining an acceptable accuracy loss on the ImageNet dataset.

Flexible-width Bit-level Compressor for Convolutional Neural Network

Memory-Efficient Compression Based on Least-Squares Fitting in Convolutional Neural Network Accelerators.

Accelerating Low Bit-Width Convolutional Neural Networks with Embedded FPGA.

A Low Bit-Width Parameter Representation Method for Hardware-Oriented Convolution Neural Networks.

Using Data Compression For Optimizing Fpga-Based Convolutional Neural Network Accelerators

An algorithm/hardware co‐optimized method to accelerate CNNs with compressed convolutional weights on FPGA

Energy-Efficient Architecture for FPGA-based Deep Convolutional Neural Networks with Binary Weights

An Efficient Streaming Accelerator for Low Bit-Width Convolutional Neural Networks

An End-to-End Compression Framework Based on Convolutional Neural Networks

Focused Quantization for Sparse CNNs

WRA-MF: A Bit-Level Convolutional-Weight-Decomposition Approach to Improve Parallel Computing Efficiency for Winograd-Based CNN Acceleration

High Performance CNN Accelerators Based on Hardware and Algorithm Co-Optimization

A Computationally Efficient Neural Video Compression Accelerator Based on a Sparse CNN-Transformer Hybrid Network

Weightless: Lossy Weight Encoding For Deep Neural Network Compression

An Efficient Hardware Accelerator for Sparse Convolutional Neural Networks on FPGAs

Ultra High Fidelity Deep Image Decompression With l∞-Constrained Compression

A hardware-friendly logarithmic quantization method for CNNs and FPGA implementation

Spwa: An Efficient Sparse Winograd Convolutional Neural Networks Accelerator On Fpgas

Smilodon: an Efficient Accelerator for Low Bit-Width CNNs with Task Partitioning

FlexBCM: Hybrid Block-Circulant Neural Network and Accelerator Co-Search on FPGAs

A Flexible and Efficient FPGA Accelerator for Various Large-Scale and Lightweight CNNs