Abstract:Recent progress in high-level synthesis (HLS) has helped raise the abstraction level of hardware design. HLS flows reduce designer effort by allowing development in a high-level language, which improves debugging, code reuse and ability to explore different implementation options. However, although the HLS process is fast, implementation and performance analysis still require lengthy logic synthesis and physical design. For design optimization, HLS tools require design space exploration to obtain parallelism at multiple levels of granularity including parallelism within a single HLS-generated core and parallelism between multiple instances of cores. Core interconnect and external bandwidth limitations can significantly impact feasible options in the design space. With many dimensions in a design space exploration, it quickly becomes infeasible to perform full logic synthesis and physical design for each possible design point. However, generation and evaluation of communications infrastructure as part of the exploration is critical to determine the system performance. Thus, in this paper, we extend the prior multilevel granularity parallelism exploration in the FCUDA HLS flow, which takes CUDA code as design input and generates a corresponding field programmable gate array implementation. Our framework performs an initial characterization of the application design space, then analytically explores the design space considering parallelism, core interconnect, and external memory bandwidth, and selects a pare-to-optimal set of designs. Our flow is completely automated to perform the exploration to characterize the analytical model, perform the exploration, select a solution, and integrate multiple instantiations of FCUDA cores via an advanced extensible interface bus interconnect. Our results demonstrate that this new FCUDA flow efficiently identifies and generates implementations with up to 5× improved system performance compared to single-level granularity parallelism (core-level optimization).

FADO: Floorplan-Aware Directive Optimization for High-Level Synthesis Designs on Multi-Die FPGAs

FADO: Floorplan-Aware Directive Optimization Based on Synthesis and Analytical Models for High-Level Synthesis Designs on Multi-Die FPGAs

Performance Modeling and Directives Optimization for High-Level Synthesis on FPGA.

AutoDSE: Enabling Software Programmers to Design Efficient FPGA Accelerators

Correlated Multi-objective Multi-fidelity Optimization for HLS Directives Design

FCUDA-HB: Hierarchical and Scalable Bus Architecture Generation on FPGAs with the FCUDA Flow

HIDA: A Hierarchical Dataflow Compiler for High-Level Synthesis

LEAPS: Topological-Layout-Adaptable Multi-Die FPGA Placement for Super Long Line Minimization

Learning from the Past: Efficient High-level Synthesis Design Space Exploration for FPGAs

Layout Driven FPGA Packing Algorithm for Performance Optimization

HeteroFlow: An Accelerator Programming Model with Decoupled Data Placement for Software-Defined FPGAs

Polyhedral-based data reuse optimization for configurable computing

Automated Communication and Floorplan-Aware Hardware/Software Co-Design for SoC

Analytical Placement with 3D Poisson's Equation and ADMM Based Optimization for Large-Scale 2.5D Heterogeneous FPGAs.

Reallocation and Rescheduling after floor-planning for timing optimization

A Comprehensive Methodology to Optimize FPGA Designs via the Roofline Model

Python FPGA Programming with Data-Centric Multi-Level Design

An Incremental Placement Flow for Advanced FPGAs with Timing Awareness

Chimera: A Hybrid Machine Learning Driven Multi-Objective Design Space Exploration Tool for FPGA High-Level Synthesis

A Scalable Multi-FPGA Platform for Hybrid Intelligent Optimization Algorithms

High-performance Placement Engine for Modern Large-scale FPGAs With Heterogeneity and Clock Constraints