PQ-CNN: Accelerating Product Quantized Convolutional Neural Network on FPGA

Jialiang Zhang,Jing Li
DOI: https://doi.org/10.1109/FCCM.2018.00041
2018-01-01
Abstract:This work presents an efficient CNN computation framework on FPGA, which utilizes Product Quantization (PQ). Compared to other compression methods, PQ has larger compression ratios and, furthermore, it alleviates the irregularity problem. However, its algorithmic benefits do not translate to system performance gains because of: 1) a large codebook that diminishes the compression ratio; 2) large numbers of look-up operations that are inefficient on CPU and GPU architectures. In this work, to address these problems, we first provide an analytical model to guide our design and find a dilemma for selecting PQ parameters. Then, we propose a software/hardware method to tackle these issues. We present a complete framework to optimally implement PQ-CNN on FPGA. According to our experimental results, we can achieve 140 Tops equivalent throughput, 475 Gops/w energy efficiency and with less than 0.5% accuracy degradation.
What problem does this paper attempt to address?