Generating Systolic Array Accelerators with Reusable Blocks

Liancheng Jia,Liqiang Lu,Xuechao Wei,Yun Liang
DOI: https://doi.org/10.1109/mm.2020.2997611
IF: 2.8212
2020-01-01
IEEE Micro
Abstract:Systolic array architecture is widely used in spatial hardware and well-suited for many tensor processing algorithms. Many systolic array architectures are implemented with high-level synthesis (HLS) design flow. However, existing HLS tools do not favor of modular and reusable design, which brings inefficiency for design iteration. In this article, we analyze the systolic array design space, and identify the common structures of different systolic dataflows. We build hardware module templates using Chisel infrastructure, which can be reused for different dataflows and computation algorithms. This remarkably improves the productivity for the development and optimization of systolic architecture. We further build a systolic array generator that transforms the tensor algorithm definition to a complete systolic hardware architecture. Experiments show that we can implement systolic array designs for different applications and dataflows with little engineering effort, and the performance throughput outperforms HLS designs.
What problem does this paper attempt to address?