Abstract:Acceleration in the form of customized datapaths offer large performance and energy improvements over general purpose processors. Reconfigurable fabrics such as FPGAs are gaining popularity for use in implementing application-specific accelerators, thereby increasing the importance of having good high-level FPGA design tools. However, current tools for targeting FPGAs offer inadequate support for high-level programming, resource estimation, and rapid and automatic design space exploration. We describe a design framework that addresses these challenges. We introduce a new representation of hardware using parameterized templates that captures locality and parallelism information at multiple levels of nesting. This representation is designed to be automatically generated from high-level languages based on parallel patterns. We describe a hybrid area estimation technique which uses template-level models and design-level artificial neural networks to account for effects from hardware place-and-route tools, including routing overheads, register and block RAM duplication, and LUT packing. Our runtime estimation accounts for off-chip memory accesses. We use our estimation capabilities to rapidly explore a large space of designs across tile sizes, parallelization factors, and optional coarse-grained pipelining, all at multiple loop levels. We show that estimates average 4.8% error for logic resources, 6.1% error for runtimes, and are 279 to 6533 times faster than a commercial high-level synthesis tool. We compare the best-performing designs to optimized CPU code running on a server-grade 6 core processor and show speedups of up to 16.7×.

Speedup Factor Estimation Through Dynamic Behavior Analysis For Fpga

Design Of A Dynamic Memory Access Scheduler

Study and realization of EPA communication schedule algorithm based on parallel processing

A Performance Model for Run-Time Reconfigurable Hardware Accelerator

Eliminating Timing Errors Through Collaborative Design to Maximize the Throughput

Automatic generation of efficient accelerators for reconfigurable hardware

On-Chip Structures for Fmax Binning and Optimization

A Study of the Speedups and Competitiveness of FPGA Soft Processor Cores using Dynamic Hardware/Software Partitioning

Performance Estimation Techniques with MPSoC Transaction-Accurate Models

Bit-Level Optimization For High-Level Synthesis And Fpga-Based Acceleration

An Efficient FPGA Design for Fixed-point Exponential Calculation.

A Novel Power Estimation Framework for SRAM-based FPGAs

Performance estimation of program partitions on multi-core platforms

Scheduling to timing optimization for a novel high-level synthesis approach

Fast Statistical Timing Analysis for Circuits with Post-Silicon Tunable Clock Buffers

The Hardware Measurement System for High-Speed Network Flow

Fpga Accelerated Parallel Sparse Matrix Factorization For Circuit Simulations

Statistical Framework and Built-In Self-Speed-Binning System for Speed Binning Using On-Chip Ring Oscillators.

Accelerating stochastic‐based reliability estimation for combinational circuits at RTL using GPU parallel computing

Fast On-Line Real-Time Scheduling Algorithm for Reconfigurable Computing.

Fast Routability Estimation for IC Physical Design