Edge FPGA-based Onsite Neural Network Training.

Ruiqi Chen,Haoyang Zhang,Yu Li,Runzhou Zhang,Guoyu Li,Jun Yu,Kun Wang
DOI: https://doi.org/10.1109/ISCAS46773.2023.10181582
2023-01-01
Abstract:Conjugate gradient (CG) is widely used in training sparse neural networks. However, CG, involving a large amount of sparse matrix and vector operations, cannot be efficiently implemented on resource-limited edge devices. In this paper, a high-performance and energy-efficient CG accelerator implemented on edge Field Programmable Gate Array is proposed for fast onsite neural networks training. According to the profiling, we propose a unified matrix multiplier that is compatible with the sparse and dense matrix. We also design a novel T-engine to handle transpose operation with the compressed sparse format. Experimental results show that our proposal outperforms the state-of-the-art FPGA work with a resource reduction of up to 41.3%. In addition, we achieve on average 10.2 x and 2.0 x speedup, while 10.1x and 3.5 x better energy efficiency than implementations on CPU and GPU, respectively.
What problem does this paper attempt to address?