A High-Performance Accelerator for Real-Time Super-Resolution on Edge FPGAs
Hongduo Liu,Yijian Qian,Youqiang Liang,Bin Zhang,Zhaohan Liu,Tao He,Wenqian Zhao,Jiangbo Lu,Bei Yu
DOI: https://doi.org/10.1145/3652855
IF: 1.447
2024-03-16
ACM Transactions on Design Automation of Electronic Systems
Abstract:In the digital era, the prevalence of low-quality images contrasts with the widespread use of high-definition displays, primarily due to low-resolution cameras and compression technologies. Image super-resolution (SR) techniques, particularly those leveraging deep learning, aim to enhance these images for high-definition presentation. However, real-time execution of deep neural network (DNN)-based SR methods at the edge poses challenges due to their high computational and storage requirements. To address this, field-programmable gate arrays (FPGAs) have emerged as a promising platform, offering flexibility, programmability, and adaptability to evolving models. Previous FPGA-based SR solutions have focused on reducing computational and memory costs through aggressive simplification techniques, often sacrificing the quality of the reconstructed images. This paper introduces a novel SR network specifically designed for edge applications, which maintains reconstruction performance while managing computation costs effectively. Additionally, we propose an architectural design that enables the real-time and end-to-end inference of the proposed SR network on embedded FPGAs. Our key contributions include a tailored SR algorithm optimized for embedded FPGAs, a DSP-enhanced design that achieves a significant four-fold speedup, a novel scalable cache strategy for handling large feature maps, optimization of DSP cascade consumption, and a constraint optimization approach for resource allocation. Experimental results demonstrate that our FPGA-specific accelerator surpasses existing solutions, delivering superior throughput, energy efficiency, and image quality.
computer science, software engineering, hardware & architecture