Understanding Performance Differences of FPGAs and GPUs

Jason Cong,Zhenman Fang,Michael Lo,Hanrui Wang,Jingxian Xu,Shaochong Zhang
DOI: https://doi.org/10.1109/fccm.2018.00023
2018-04-01
Abstract:This paper aims to better understand the performance differences between FPGAs and GPUs. We intentionally begin with a widely used GPU-friendly benchmark suite, Rodinia, and port 15 of the kernels onto FPGAs using HLS C. Then we propose an analytical model to compare their performance. We find that for 6 out of the 15 ported kernels, today's FPGAs can provide comparable performance or even achieve better performance than the GPU, while consuming an average of 28% of the GPU power. Besides lower clock frequency, FPGAs usually achieve a higher number of operations per cycle in each customized deep pipeline, but lower effective parallel factor due to the far lower off-chip memory bandwidth. With 4x more memory bandwidth, 8 out of the 15 FPGA kernels are projected to achieve at least half of the GPU kernel performance.
What problem does this paper attempt to address?