DLBooster

Cheng Yang,Dan Li,Zhiyuan Guo,Binyao Jiang,Jiaxin Lin,Xi Fan,Jinkun Geng,Xinyi Yu,Wei Bai,Lei Qu,Ran Shu,Peng Cheng,Yongqiang Xiong,Jianping Wu
DOI: https://doi.org/10.1145/3337821.3337892
2019-01-01
Abstract:In recent years, deep learning (DL) has prospered again due to improvements in both computing and learning theory. Emerging studies mostly focus on the acceleration of refining DL models but ignore data preprocessing issues. However, data preprocessing can significantly affect the overall performance of end-to-end DL workflows. Our studies on several image DL workloads show that existing preprocessing backends are quite inefficient: they either perform poorly in throughput (30% degradation) or burn too many (>10) CPU cores. Based on these observations, we propose DLBooster, a high-performance data preprocessing pipeline that selectively offloads key workloads to FPGAs, to fit the stringent demands on data preprocessing for cutting-edge DL applications. Our testbed experiments show that, compared with the existing baselines, DLBooster can achieve 1.35×~2.4× image processing throughput in several DL workloads, but consumes only 1/10 CPU cores. Besides, it also reduces the latency by 1/3 in online image inference.
What problem does this paper attempt to address?