FPGA Resource Pooling in Cloud Computing

Zhuangdi Zhu,Alex X. Liu,Fan Zhang,Fei Chen
DOI: https://doi.org/10.1109/tcc.2018.2874011
IF: 5.697
2021-04-01
IEEE Transactions on Cloud Computing
Abstract:Cloud providers have started to deploy various FPGA accelerators in their datacenters because the performance of many applications can be significantly improved by implementing their core routines in FPGAs. In conventional datacenters with FPGA accelerated servers, if a tenant wants to use FPGA accelerators, it requests for a VM instance residing in a server equipped with an FPGA accelerator. This paradigm to integrate FPGA into Cloud leads to poor resource sharing of the precious FPGA resources. In this paper, we propose FPGAPooling, an FPAG-enabled Cloud system where all FPGA accelerators are managed as a single resource pool and shared among all VMs. For a VM, instead of requesting the Cloud to run the VM on an FPGA accelerated server, at run time, when a VM needs to use FPGA acceleration, it requests an FPGA accelerator from the pool. After the VM finishes using the FPGA accelerator, it releases the FPGA accelerator back to the pool. We design a centralized scheduler to handle acceleration requests from VMs and assign each request to an idle FPGA accelerator at run time; We implemented a system prototype on IBM’s OpenPower Cloud system. The key challenging of FPGAPooling is scheduling. We designed and implemented a group of scheduling algorithms for the FPGAPooling system. With extensive evaluations on both a small testbed and a large-scale simulation, we found that our algorithms can improve the average and tail job completion time by up to 7 and 4 times, respectively.
computer science, information systems, theory & methods
What problem does this paper attempt to address?
The paper primarily focuses on addressing the effective management and sharing of FPGA resources in cloud computing environments. Specifically, the paper identifies that the traditional resource allocation methods for FPGA-accelerated servers in data centers lead to inefficient utilization of valuable FPGA resources. In the conventional mode, if a tenant needs to use an FPGA accelerator, they must request a virtual machine (VM) instance located on a server equipped with FPGA accelerators, which limits the sharing of FPGA resources. To tackle this issue, the authors propose a system named FPGAPooling, which manages all FPGA accelerators as a single resource pool and allows all virtual machines (VMs) to share these resources. In FPGAPooling, when a VM requires FPGA acceleration, it no longer requests to run on a server with FPGA accelerators but directly requests an FPGA accelerator from the resource pool. The system designs a centralized scheduler to handle acceleration requests from VMs and dynamically assigns each request to an idle FPGA accelerator in real-time. Once a VM finishes using the FPGA accelerator, the accelerator is released back into the resource pool, thus achieving efficient cyclical use of resources. The paper also provides a detailed description of the design and implementation of the FPGAPooling system, as well as a series of challenges and solutions for the scheduling algorithms. For example, to overcome the professional barrier of hardware description languages (such as VHDL or Verilog) required for programming FPGAs, the system offers programming abstractions with API functions, allowing users to dynamically request and use different FPGA accelerators within their own applications. Moreover, the paper proposes several scheduling algorithms, including a Resource-Aware algorithm that considers network and Direct Memory Access (DMA) contention, and a Workload-Aware algorithm optimized for different workload distributions, aimed at improving average and tail latency of task completion times. By building a prototype system on IBM's OpenPower cloud system, experimental results show that the proposed Workload-and-Resource-Aware algorithm significantly improves task completion efficiency. Compared to traditional First-In-First-Out (FIFO) and Shortest-Job-First (SJF) scheduling strategies, the average task completion time and the 95th percentile tail latency are improved by up to 7 times and 4 times, respectively. In summary, the FPGAPooling system effectively solves the sharing and management issues of FPGA resources in cloud computing environments through resource pooling and advanced scheduling strategies, significantly enhancing resource utilization and task processing efficiency.