BHNN: A Memory-Efficient Accelerator for Compressing Deep Neural Networks with Blocked Hashing Techniques

Jingyang Zhu,Zhiliang Qian,Chi-Ying Tsui
DOI: https://doi.org/10.1109/aspdac.2017.7858404
2017-01-01
Abstract:In this paper, we propose a novel algorithm for compressing neural networks to reduce the memory requirements by using blocked hashing techniques. By adding blocked constraints on top of the conventional hashing technique, the test error rate is maintained while the spatial locality for the computations is preserved. Using this scheme, the synaptic connections are compressed by at least an order (10×) compared with the plain neural network with virtually no prediction accuracy loss. Compared with other compression techniques, the proposed algorithm achieves the best performance in the heavy compression regions. The blocked hashing techniques are also hardware friendly, of which the memory hierarchy of the hardware architecture can be efficiently implemented. To demonstrate the hardware efficiency, we implement the hardware architecture of the deep neural networks using the proposed blocked hashing techniques on a Xilinx Virtex-7 FPGA board. With a hardware parallelism of 32, the accelerator achieves a speed-up of 22× over the CPU, and 3~5× over the GPU in the inference phase.
What problem does this paper attempt to address?