Mobile or FPGA? A Comprehensive Evaluation on Energy Efficiency and a Unified Optimization Framework

Geng Yuan,Peiyan Dong,Mengshu Sun,Wei Niu,Zhengang Li,Yuxuan Cai,Yanyu Li,Jun Liu,Weiwen Jiang,Xue Lin,Bin Ren,Xulong Tang,Yanzhi Wang
DOI: https://doi.org/10.1145/3528578
2022-04-20
ACM Transactions on Embedded Computing Systems
Abstract:Efficient deployment of Deep Neural Networks (DNNs) on edge devices (i.e., FPGAs and mobile platforms) is very challenging, especially under a recent witness of the increasing DNN model size and complexity. Model compression strategies, including weight quantization and pruning, are widely recognized as effective approaches to significantly reduce computation and memory intensities, and have been implemented in many DNNs on edge devices. However, most state-of-the-art works focus on ad-hoc optimizations, and there lacks a thorough study to comprehensively reveal the potentials and constraints of different edge devices when considering different compression strategies. In this paper, we qualitatively and quantitatively compare the energy efficiency of FPGA-based and mobile-based DNN executions using mobile GPU and provide a detailed analysis. Based on the observations obtained from the analysis, we propose a unified optimization framework using block-based pruning to reduce the weight storage and accelerate the inference speed on mobile devices and FPGAs, achieving high hardware performance and energy-efficiency gain while maintaining accuracy.
computer science, software engineering, hardware & architecture
What problem does this paper attempt to address?