An OpenCLTM Implementation of WebP Accelerator on FPGAs.

Zhenhua Guo,Baoyu Fan,Yaqian Zhao,Xuelei Li,Shixin Wei,Long Li
DOI: https://doi.org/10.1007/978-3-319-78890-6_46
2018-01-01
Abstract:With the development of cloud computing, the super-large scale of image data has bring severe challenges for the storage cost and network bandwidth in data centers. In order to alleviate the present situation effectively, WebP has replaced the current mainstream image file format due to its better compression efficiency. In this paper, we provide an OpenCL implementation of WebP accelerator on FPGAs to optimize the performance of WebP Lossy Compression Algorithm. Our accelerator makes use of a heavily-pipelined custom hardware implementation to achieve a high throughput ~450MPixel/s. The performance-per-watt of our OpenCL implementation on Intel’s Arria 10 device is 8.32x better than a highly-tuned CPU implementation on Intel Xeon E5-2690v3 with 24 thread cores. Additionally, the delay time per image can be reduced to ~90% by the data parallelism and macroblock pipelining on FPGAs. Finally, our OpenCLTM implementation of WebP accelerator on FPGAs is more competitive for data centers to achieve higher performance and lower cost.
What problem does this paper attempt to address?