Abstract:We describe the design of a non-operating-system based embedded system to automate the management, reordering, and movement of data produced by FPGA accelerators within data centre environments. In upcoming cloud computing environments, where FPGA acceleration may be leveraged via Infrastructure as a Service (IaaS), end users will no longer have full access to the underlying hardware resources. We envision a partially reconfigurable FPGA region that end-users can access for their custom acceleration needs, and a static “template” region offered by the data centre to manage all Input/Output (IO) data requirements to the FPGA. Thus our low-level software controlled system allows for standard DDR access to off-chip memory, as well as DMA movement of data to and from SATA based SSDs, and access to Ethernet stream links. Two use cases of FPGA accelerators are presented as experimental examples to demonstrate the area and performance costs of integrating our data-management system alongside such accelerators. Comparisons are also made to fully custom data management solutions implemented solely in RTL Verilog to determine the tradeoffs in using our system in regards to development time, area, and performance. We find that for a class of accelerators in which the physical data rate of an IO channel is the limiting bottleneck to accelerator throughput, our solution offers drastically reduced logic development time spent on data management without any associated performance losses in doing so. However, for a class of applications where the IO channel is not the bottle-neck, our solution trades off increased area usage to save on design times and to maintain acceptable system throughput in the face of degraded IO throughput.

Arcus: SLO Management for Accelerators in the Cloud with Traffic Shaping

Accelerator-as-a-Service in Public Clouds: An Intra-Host Traffic Management View for Performance Isolation in the Wild

Triton: A Flexible Hardware Offloading Architecture for Accelerating Apsara Vswitch in Alibaba Cloud

Laius: Towards Latency Awareness and Improved Utilization of Spatial Multitasking Accelerators in Datacenters

Laius: T owards l atency a wareness and i mproved u tilization of s patial multitasking accelerators in datacenters

Real-Time Scheduling Upon a Host-Centric Acceleration Architecture with Data Offloading

AuRORA: A Full-Stack Solution for Scalable and Virtualized Accelerator Integration

Poly: Efficient Heterogeneous System and Application Management for Interactive Applications

Autothrottle: A Practical Bi-Level Approach to Resource Management for SLO-Targeted Microservices

A Comprehensive Test Framework for Cryptographic Accelerators in the Cloud

Modeling Mobile Code Acceleration in the Cloud

ORCA: A Network and Architecture Co-design for Offloading us-scale Datacenter Applications

IO and data management for infrastructure as a service FPGA accelerators

Alita: Comprehensive Performance Isolation through Bias Resource Management for Public Clouds

A Study of FPGA Virtualization and Accelerator Scheduling

Optimizing Offload Performance in Heterogeneous MPSoCs

FLARE: Flexibly Sharing Commodity GPUs to Enforce QoS and Improve Utilization

Job Scheduling For Acceleration Systems In Cloud Computing

CHARM: Collaborative Host and Accelerator Resource Management for GPU Datacenters

LOAM: Low-latency Communication, Caching, and Computation Placement in Data-Intensive Computing Networks

RPCAcc: A High-Performance and Reconfigurable PCIe-attached RPC Accelerator