Abstract:FPGA vendors provide OpenCL software development kits for easier programmability, with the goal of replacing the time-consuming and error-prone register-transfer level (RTL) programming. Many studies explore optimization methods (e.g., loop unrolling, local memory) to accelerate OpenCL programs running on FPGAs. These programs typically follow the default OpenCL execution model, where a kernel deploys multiple work-items arranged into work-groups. However, the default execution model is not always a good fit for an application mapped to the FPGA architecture, which is very different from the multithreaded architecture of GPUs, for which OpenCL was originally designed. In this work, we identify three other execution models that can better utilize the FPGA resources for the OpenCL applications that do not fit well into the default execution model. These three execution models are based on two OpenCL features devised for FPGA programming (namely, single work-item kernel and OpenCL channel). We observe that the selection of the right execution model determines the performance upper bound of a particular application, which can vary by two orders magnitude between the most suitable execution model and the most unsuitable one. However, there is no way to select the most suitable execution model other than empiricall exploring the optimization space for the four of them, which can be prohibitive. To help FPGA programmers identify the right execution model, we propose Boyi, a systematic framework that makes automatic decisions by analyzing OpenCL programming patterns in an application. After finding the right execution model with the help of Boyi, programmers can apply other conventional optimizations to reach the performance upper bound. Our experimental evaluation shows that Boyi can 1) accurately determine the right execution model, and 2) greatly reduce the exploration space of conventional optimization methods.

SOFF: An OpenCL High-Level Synthesis Framework for FPGAs

A Performance Analysis Framework For Optimizing Opencl Applications On Fpgas

Boyi: A Systematic Framework for Automatically Deciding the Right Execution Model of OpenCL Applications on FPGAs

Resource-Aware Just-in-Time OpenCL Compiler for Coarse-Grained FPGA Overlays

A Comprehensive Framework for Synthesizing Stencil Algorithms on FPGAs Using OpenCL Model

From opencl to high-performance hardware on FPGAS

Challenging Portability Paradigms: FPGA Acceleration Using SYCL and OpenCL

Extending High-Level Synthesis for Task-Parallel Programs

Exploring memory synchronization and performance considerations for FPGA platform using the high-abstracted OpenCL framework: Benchmarks development and analysis

High Level Programming for Heterogeneous Architectures

FlexCL: A Model of Performance and Power for OpenCL Workloads on FPGAs.

FOS: A Modular FPGA Operating System for Dynamic Workloads

A Framework for Iterative Stencil Algorithm Synthesis on FPGAs from OpenCL Programming Model (abstract Only).

FlexCL

pocl: A Performance-Portable OpenCL Implementation

An Optimizing Framework on MLIR for Efficient FPGA-based Accelerator Generation

Towards Automatic Transformation of Legacy Scientific Code into OpenCL for Optimal Performance on FPGAs

FFCNN: Fast FPGA based Acceleration for Convolution neural network inference

HeteroFlow: An Accelerator Programming Model with Decoupled Data Placement for Software-Defined FPGAs

Integrated CUDA-to-FPGA Synthesis with Network-on-Chip

A Soft Processor Overlay with Tightly-coupled FPGA Accelerator