Abstract:Multi-die FPGAs are crucial components in modern computing systems, particularly for high-performance applications such as artificial intelligence and data centers. Super long lines (SLLs) provide interconnections between super logic regions (SLRs) for a multi-die FPGA on a silicon interposer. They have significantly higher delay compared to regular interconnects, which need to be minimized. With the increase in design complexity, the growth of SLLs gives rise to challenges in timing and power closure. Existing placement algorithms focus on optimizing the number of SLLs but often face limitations due to specific topologies of SLRs. Furthermore, they fall short of achieving continuous optimization of SLLs throughout the entire placement process. This highlights the necessity for more advanced and adaptable solutions. In this paper, we propose LEAPS, a comprehensive, systematic, and adaptable multi-die FPGA placement algorithm for SLL minimization. Our contributions are threefold: 1) proposing a high-performance global placement algorithm for multi-die FPGAs that optimizes the number of SLLs while addressing other essential design constraints such as wirelength, routability, and clock routing; 2) introducing a versatile method for more complex SLR topologies of multi-die FPGAs, surpassing the limitations of existing approaches; and 3) executing continuous optimization of SLLs across the whole placement stages, including global placement (GP), legalization (LG), and detailed placement (DP). Experimental results demonstrate the effectiveness of LEAPS in reducing SLLs and enhancing circuit performance. Compared with the most recent state-of-the-art (SOTA) method, LEAPS achieves an average reduction of 43.08% in SLLs and 9.99% in HPWL, while exhibiting a notable 34.34$\times$ improvement in runtime.

Latency Minimal Scheduling with Maximum Instruction Parallelism

Dependency Graph-based High-level Synthesis for Maximum Instruction Parallelism

Compiler Discovered Dynamic Scheduling of Irregular Code in High-Level Synthesis

Subgraph Extraction-based Feedback-guided Iterative Scheduling for HLS

GSA to HDL: Towards principled generation of dynamically scheduled circuits

Optimizing Scheduling Technology for Clustered VLIW Architectures Using Data Dependence Graph

Automatic multi-dimensional pipelining for high-level synthesis of dataflow accelerators

Optimizing VLIW Instruction Scheduling via a Two-Dimensional Constrained Dynamic Programming

Multilevel Granularity Parallelism Synthesis on FPGAs

Backtracking Optimized DDG Directed Scheduling Algorithm for Clustered VLIW Architectures

Joint Modulo Scheduling And Memory Partitioning With Multi-Bank Memory For High-Level Synthesis

High-performance Placement Engine for Modern Large-scale FPGAs With Heterogeneity and Clock Constraints

Memory fartitioning-based modulo scheduling for high-level synthesis.

Highly Efficient Modulo Loop Pipeline for High Level Synthesis

R-HLS: An IR for Dynamic High-Level Synthesis and Memory Disambiguation based on Regions and State Edges

LEAPS: Topological-Layout-Adaptable Multi-Die FPGA Placement for Super Long Line Minimization

GAHLS: an optimized graph analytics based high level synthesis framework

Deep Inverse Design for High-Level Synthesis

Limited Duplication-Based List Scheduling Algorithm for Heterogeneous Computing System

Ultra-Low Latency Service Provision in Edge Computing.