NAF: Deeper Network/Accelerator Co-Exploration for Customizing CNNs on FPGA

Wenqi Lou,Jiaming Qian,Lei Gong,Xuan Wang,Chao Wang,Xuehai Zhou
DOI: https://doi.org/10.23919/DATE56975.2023.10137094
2023-01-01
Abstract:Recently, algorithm and hardware co-design for neural networks (NNs) has become the key to obtaining highquality solutions. However, prior works lack consideration of the underlying hardware and thus suffer from a severely unbalanced neural architecture and hardware architecture search (NA-HAS) space on FPGAs, failing to unleash the performance potential. Nevertheless, a deeper joint search leads to a larger (multiplicative) search space, highly challenging the search. To this end, we propose an efficient differentiable search framework NAF, which jointly searches the networks (e.g., operations and bitwidths) and accelerators (e.g., heterogeneous multicores and mappings) under a balanced NA-HAS space. Concretely, we design a coarse-grained hardware-friendly quantization algorithm and integrate it at a block granularity into the co-search process. Meanwhile, we design a highly optimized block processing unit (BPU) with key dataflow configurable. Afterward, a dynamic hardware generation algorithm based on modeling and heuristic rules is designed to perform the critical HAS and fast generate hardware feedback. Experimental results show that compared with the previous state-of-the-art (SOTA) co-design works, NAF improves the throughput by 1.99x similar to 6.84x on Xilinx ZCU102 and energy efficiency by 17%similar to 88% under similar accuracy on the ImageNet dataset.
What problem does this paper attempt to address?