Lasa: Abstraction and Specialization for Productive and Performant Linear Algebra on FPGAs

Xiaochen Hao,Mingzhe Zhang,Ce Sun,Zhuofu Tao,Hongbo Rong,Yu Zhang,Lei He,Eric Petit,Wenguang Chen,Yun Liang
DOI: https://doi.org/10.1109/FCCM57271.2023.00013
2023-01-01
Abstract:Linear algebra can often be significantly expedited by spatial accelerators on FPGAs. As a broadly-adopted linear algebra library, BLAS requires extensive optimizations for routines that vary vastly in data reuse, bottleneck resources, matrix storage layouts, and data types. Existing solutions are stuck in the dilemma of productivity and performance. We introduce Lasa, a framework composed of a programming model and a compiler, that addresses the dilemma by abstracting (for productivity) and specializing (for performance) the architecture of a spatial accelerator. Lasa abstracts a compute and its I/O as two dataflow graphs. A compiler maps the graphs onto systolic arrays and a customized memory heirarchy. The compiler further specializes the architecture transparently. In this framework, we develop 14 key BLAS routines, and demonstrate performance in parity with expert-written HLS code for BLAS level 3 routines, >=80% machine peak performance for level 2 and 1 routines, and 1.6X-7X speed up by taking advantage of matrix properties of symmetry, triangularity and bandness.
What problem does this paper attempt to address?