RapidStream IR: Infrastructure for FPGA High-Level Physical Synthesis

Jason Lau,Yuanlong Xiao,Yutong Xie,Yuze Chi,Linghao Song,Shaojie Xiang,Michael Lo,Zhiru Zhang,Jason Cong,Licheng Guo
DOI: https://doi.org/10.1145/3676536.3676649
2024-10-17
Abstract:The increasing complexity of large-scale FPGA accelerators poses significant challenges in achieving high performance while maintaining design productivity. High-level synthesis (HLS) has been adopted as a solution, but the mismatch between the high-level description and the physical layout often leads to suboptimal operating frequency. Although existing proposals for high-level physical synthesis, which use coarse-grained design partitioning, floorplanning, and pipelining to improve frequency, have gained traction, they lack a framework enabling (1) pipelining of real-world designs at arbitrary hierarchical levels, (2) integration of HLS blocks, vendor IPs, and handcrafted RTL designs, (3) portability to emerging new target FPGA devices, and (4) extensibility for the easy implementation of new design optimization tools. We present RapidStream IR, a practical high-level physical synthesis (HLPS) infrastructure for representing the composition of complex FPGA designs and exploring physical optimizations. Our approach introduces a flexible intermediate representation (IR) that captures interconnection protocols at arbitrary hierarchical levels, coarse-grained pipelining, and spatial information, enabling the creation of reusable passes for design frequency optimizations. RapidStream IR improves the frequency of a broad set of mixed-source designs by 7% to 62%, including large language models and genomics accelerators, and is portable to user-customizable new FPGA platforms. We further demonstrate its extensibility through case studies, showcasing the ability to facilitate future research.
Hardware Architecture,Distributed, Parallel, and Cluster Computing
What problem does this paper attempt to address?
This paper attempts to address the problem of achieving high performance while maintaining design productivity in large-scale FPGA accelerator designs. Specifically, the paper addresses the following issues: 1. **Mismatch between High-Level Synthesis (HLS) and physical layout**: The code generated by HLS often results in suboptimal operating frequencies during physical implementation due to a mismatch between the HLS description and the physical layout. 2. **Limitations of existing High-Level Physical Synthesis (HLPS) methods**: - **Insufficient hierarchical optimization**: Existing methods do not support design optimization at arbitrary levels of hierarchy, leading to all task-level parallel modules needing to be interconnected in the top-level HLS function. - **Difficulty in integrating mixed sources**: They cannot integrate manually written RTL and vendor-provided IPs, whereas practical HLS designs often include these components. - **Poor device adaptability**: Existing methods are limited to specific FPGA devices, making it difficult to adapt to new hardware that meets specific computational or budgetary requirements. - **Poor scalability**: There is a lack of a scalable framework to explore different partitioning and pipelining schemes. To address these issues, the paper proposes RapidStream IR (RIR), an infrastructure for FPGA high-level physical synthesis that supports mixed-source FPGA HLS designs and customizable FPGA devices, aiming to optimize designs for high frequency. RIR addresses the above issues through the following key features: 1. **Intermediate Representation (IR)**: Provides a flexible and extensible intermediate representation of the input design that can be transformed using any programming language. This IR effectively captures the connectivity of the design, the pipelining capability in the hierarchical structure, and spatial information. 2. **Reusable design optimization passes**: Provides a set of reusable passes for transforming designs, such as hierarchical reconstruction, module partitioning, and module insertion. Researchers can explore different optimization strategies through these passes and easily customize the framework to fit specific design goals. 3. **Support for multiple design formats**: Provides parsers for different design formats, such as Verilog, Xilinx Compiled IPs (XCI), and Vitis HLS-generated designs. The framework can be extended to other source formats by implementing information extractors. 4. **Cross-platform portability**: Ensures portability across different FPGA platforms by providing interfaces for defining new devices without changing the parsers or optimization passes. Through these features, RIR not only improves the frequency of designs but also enhances design productivity and maintainability. Experimental results show that RIR achieves a 30% to 62% frequency improvement on various FPGA devices, with an average frequency reaching 244 MHz.