Abstract:FPGAs are an attractive platform for applications with high computation demand and low energy consumption requirements. However, design effort for FPGA implementations remains high--often an order of magnitude larger than design effort using high-level languages. Instead of this time-consuming process, high-level synthesis (HLS) tools generate hardware implementations from algorithm descriptions in languages such as C/C++ and SystemC. Such tools reduce design effort: high-level descriptions are more compact and less error prone. HLS tools promise hardware development abstracted from software designer knowledge of the implementation platform. In this paper, we present an unbiased study of the performance, usability and productivity of HLS using AutoPilot (a state-of-the-art HLS tool). In particular, we first evaluate AutoPilot using the popular embedded benchmark kernels. Then, to evaluate the suitability of HLS on real-world applications, we perform a case study of stereo matching, an active area of computer vision research that uses techniques also common for image denoising, image retrieval, feature matching, and face recognition. Based on our study, we provide insights on current limitations of mapping general-purpose software to hardware using HLS and some future directions for HLS tool development. We also offer several guidelines for hardware-friendly software design. For popular embedded benchmark kernels, the designs produced by HLS achieve 4× to 126× speedup over the software version. The stereo matching algorithms achieve between 3.5× and 67.9× speedup over software (but still less than manual RTL design) with a fivefold reduction in design effort versus manual RTL design.

CuFP: An HLS Library for Customized Floating-Point Operators

Designing an IEEE-compliant FPU that supports configurable precision for soft processors

A Floating-point Coprocessor Configured by a FPGA in a Digital Platform Based on Fixed-point DSP for Power Electronics

A Low-Cost Floating-Point FMA Unit Supporting Package Operations for HPC-AI Applications

Finite-Time Lyapunov Exponent Calculation on FPGA using High-Level Synthesis Tools

FHAM: FPGA-based High-Efficiency Approximate Multipliers Via LUT Encoding

Hermes: Enhancing Extensibility in High-Level Synthesis Through Multi-Level IRs

High-level synthesis: productivity, performance, and software constraints

LUT‐DSP usage trade‐off for re‐configurable convolution acceleration core based on small logarithmic floating point representation

FPGA Designs with Optimized Logarithmic Arithmetic

Fortran High-Level Synthesis: Reducing the barriers to accelerating HPC codes on FPGAs

High-performance Placement Engine for Modern Large-scale FPGAs With Heterogeneity and Clock Constraints

A DSP shared is a DSP earned: HLS Task-Level Multi-Pumping for High-Performance Low-Resource Designs

FPnew: An Open-Source Multi-Format Floating-Point Unit Architecture for Energy-Proportional Transprecision Computing

An Effective Implementation of Dual Path Fused Floating-Point Add-Subtract Unit for Reconfigurable Architectures

Tools and Techniques for Efficient High-Level System Design on FPGAs

Design of a Floating-point Coprocessor and Its Applications in Digital Platform for Power Electronics

Run-time reconfigurable multi-precision floating point multiplier design for high speed, low-power applications

Design of Floating-point Vector Coprocessor Based on FPGA

Low-Latency Architecture for Implementing Floating-Point Multiplier and Divider Based on Symmetric-Mapping LUT