Multi-clusters: an Efficient Design Paradigm of NN Accelerator Architecture Based on FPGA

Teng Wang,Lei Gong,Chao Wang,Yang,Yingxue Gao
DOI: https://doi.org/10.1007/978-3-031-21395-3_14
2022-01-01
Abstract:With the serial development of neural network models, choosing a superior platform for these complex computing applications is essential. Field-Programmable Gate Array (FPGA) is gradually becoming an accelerating platform that balances power and performance. The design of architecture in neural network accelerator based on FPGA is about two categories, stream and single-engine. Both design paradigms have advantages and disadvantages. The stream is easier to achieve high performance because of model customization but has low kernel compatibility. The single-engine is more flexible but hasmore scheduling overhead. Therefore, this work proposes a new design paradigm for the neural network accelerator based on FPGA, called the Multi-clusters (MC), which combines the characteristics of the above two design categories. We divide the original network model according to the calculated features. Then, different cores are designed to map these parts separately for efficient execution. The fine-grained pipeline is performed inside the cores. Multiple cores are executed by software scheduling and implement a coarse-grained schedule, thereby improving the overall computing performance. The experimental results show that the accelerator with the MC category achieved 39.7x times improvement of performance and 7.9x times improvement of energy efficiency compared with CPU and GPU, and finally obtained nearly 680.3 GOP/s computing performance in the peek.
What problem does this paper attempt to address?