Abstract:Cloud providers have started to deploy various FPGA accelerators in their datacenters because the performance of many applications can be significantly improved by implementing their core routines in FPGAs. In conventional datacenters with FPGA accelerated servers, if a tenant wants to use FPGA accelerators, it requests for a VM instance residing in a server equipped with an FPGA accelerator. This paradigm to integrate FPGA into Cloud leads to poor resource sharing of the precious FPGA resources. In this paper, we propose FPGAPooling, an FPAG-enabled Cloud system where all FPGA accelerators are managed as a single resource pool and shared among all VMs. For a VM, instead of requesting the Cloud to run the VM on an FPGA accelerated server, at run time, when a VM needs to use FPGA acceleration, it requests an FPGA accelerator from the pool. After the VM finishes using the FPGA accelerator, it releases the FPGA accelerator back to the pool. We design a centralized scheduler to handle acceleration requests from VMs and assign each request to an idle FPGA accelerator at run time; We implemented a system prototype on IBM’s OpenPower Cloud system. The key challenging of FPGAPooling is scheduling. We designed and implemented a group of scheduling algorithms for the FPGAPooling system. With extensive evaluations on both a small testbed and a large-scale simulation, we found that our algorithms can improve the average and tail job completion time by up to 7 and 4 times, respectively.

What problem does this paper attempt to address?

The paper primarily focuses on addressing the effective management and sharing of FPGA resources in cloud computing environments. Specifically, the paper identifies that the traditional resource allocation methods for FPGA-accelerated servers in data centers lead to inefficient utilization of valuable FPGA resources. In the conventional mode, if a tenant needs to use an FPGA accelerator, they must request a virtual machine (VM) instance located on a server equipped with FPGA accelerators, which limits the sharing of FPGA resources. To tackle this issue, the authors propose a system named FPGAPooling, which manages all FPGA accelerators as a single resource pool and allows all virtual machines (VMs) to share these resources. In FPGAPooling, when a VM requires FPGA acceleration, it no longer requests to run on a server with FPGA accelerators but directly requests an FPGA accelerator from the resource pool. The system designs a centralized scheduler to handle acceleration requests from VMs and dynamically assigns each request to an idle FPGA accelerator in real-time. Once a VM finishes using the FPGA accelerator, the accelerator is released back into the resource pool, thus achieving efficient cyclical use of resources. The paper also provides a detailed description of the design and implementation of the FPGAPooling system, as well as a series of challenges and solutions for the scheduling algorithms. For example, to overcome the professional barrier of hardware description languages (such as VHDL or Verilog) required for programming FPGAs, the system offers programming abstractions with API functions, allowing users to dynamically request and use different FPGA accelerators within their own applications. Moreover, the paper proposes several scheduling algorithms, including a Resource-Aware algorithm that considers network and Direct Memory Access (DMA) contention, and a Workload-Aware algorithm optimized for different workload distributions, aimed at improving average and tail latency of task completion times. By building a prototype system on IBM's OpenPower cloud system, experimental results show that the proposed Workload-and-Resource-Aware algorithm significantly improves task completion efficiency. Compared to traditional First-In-First-Out (FIFO) and Shortest-Job-First (SJF) scheduling strategies, the average task completion time and the 95th percentile tail latency are improved by up to 7 times and 4 times, respectively. In summary, the FPGAPooling system effectively solves the sharing and management issues of FPGA resources in cloud computing environments through resource pooling and advanced scheduling strategies, significantly enhancing resource utilization and task processing efficiency.

FPGA Resource Pooling in Cloud Computing

Engaging Heterogeneous FPGAs in the Cloud.

Enabling FPGAs in the cloud.

Minimize the Make-span of Batched Requests for FPGA Pooling in Cloud Computing

Online scheduling for FPGA computation in the Cloud

Enabling Elastic Resource Management in Cloud FPGAs Via A Multi-layer Collaborative Approach.

A Study of FPGA Virtualization and Accelerator Scheduling

Towards Hardware Support for FPGA Resource Elasticity

Deploying Multi-tenant FPGAs within Linux-based Cloud Infrastructure

RC3E: Provision and Management of Reconfigurable Hardware Accelerators in a Cloud Environment

ConvCloud: an Adaptive Convolutional Neural Network Accelerator on Cloud FPGAs.

Enabling Efficient and Flexible FPGA Virtualization for Deep Learning in the Cloud

The Future of FPGA Acceleration in Datacenters and the Cloud

Hybrid Computing for Interactive Datacenter Applications

Architectural support for sharing, isolating and virtualizing FPGA resources.

Power Aware Scheduling of Tasks on FPGAs in Data Centers

A Unified FPGA Virtualization Framework for General-Purpose Deep Neural Networks in the Cloud

Disaggregated Accelerator Management System for Cloud Data Centers

Architecture Support for FPGA Multi-tenancy in the Cloud

Seeing Shapes in Clouds: On the Performance-Cost trade-off for Heterogeneous Infrastructure-as-a-Service

A Data-Center FPGA Acceleration Platform for Convolutional Neural Networks