Calabash: Accelerating Attention Using a Systolic Array Chain on FPGAs

Zizhang Luo,Liqiang Lu,Yicheng Jin,Liancheng Jia,Yun Liang
DOI: https://doi.org/10.1109/fpl60245.2023.00041
2023-01-01
Abstract:In recent years, attention mechanism has achieved remarkable performance in natural language processing and computer vision applications, at the expense of high computation cost. FPGAs have been demonstrated to be an effective hardware platform for various AI applications. However, the attention mechanism involves complex data dependency, which makes FPGA acceleration difficult. In this paper, we propose Calabash, an FPGA accelerator for attention-based applications. We design a chain of two systolic arrays, applying the same dataflow. Then, we design two scheduling techniques for different matrices to ensure the intermediate matrix can be cached in the on-chip memory. Finally, we develop analytical models for resource utilization estimation, workload balancing, and latency prediction to guide design space exploration. Experiments show that Calabash achieves 1.76 TOP/s, 1.06 TOP/s on Xilinx VU9P and ZCU102 platforms, yielding an average 50.1X and 3.94X energy-efficiency improvement compared with CPU and GPU, respectively.
What problem does this paper attempt to address?