Abstract:Network bandwidth is improving faster than the compute capacity of the host CPU, turning the CPU into a bottleneck. As a result, SmartNICs are often used to offload packet processing, even application logic, away from the CPU. However, today many applications such as Artificial Intelligence (AI) and High Performance Computing (HPC) rely on clusters of GPUs for computation. In such clusters, the majority of the network traffic is created by the GPUs. Unfortunately, commercially available multi-core SmartNICs, such as BlueFiled-2, fail to process 100Gb network traffic at line-rate with its embedded CPU, which is capable of doing control-plane management only. Commercially available FPGA-based SmartNICs are mainly optimized for network applications running on the host CPU. To address such scenarios, in this paper we present FpgaNIC, a GPU-oriented SmartNIC to accelerate applications running on distributed GPUs. FpgaNIC is an FPGA-based, GPU-centric, versatile SmartNIC that enables direct PCIe P2P communication with local GPUs using GPU virtual address, and that provides reliable 100Gb network access to remote GPUs. FpgaNIC allows to offload various complex compute tasks to a customized data-path accelerator for line-rate in-network computing on the FPGA, thereby complementing the processing at the GPU. The data-path accelerator can be programmed using C++-based HLS (High Level Synthesis), so as to make it easier to use for software programmers. FpgaNIC has been designed to explore the design space of SmartNICs, e.g., direct, on-path, and off-path models, benefiting different type of application. It opens up a wealth of research opportunities, e.g., accelerating a broad range of distributed applications by combining GPUs and FPGAs and exploring a larger design space of SmartNICs by making them easily accessible from local GPUs.

FpgaNIC: An FPGA-based Versatile 100Gb SmartNIC for GPUs

Efficient PC-FPGA Communication over Gigabit Ethernet

A 400Gbit Ethernet core enabling High Data Rate Streaming from FPGAs to Servers and GPUs in Radio Astronomy

Extracting TCPIP Headers at High Speed for the Anonymized Network Traffic Graph Challenge

FPGA-Based AI Smart NICs for Scalable Distributed AI Training Systems

High-Performance Reconfigurable Pipeline Implementation for FPGA-Based SmartNIC

G-NET: Effective GPU Sharing in NFV Systems.

Leveraging Silicon-Photonic Noc For Designing Scalable Gpus

NPGPU: Network processing on graphics processing units

HPIPE: Heterogeneous Layer-Pipelined and Sparse-Aware CNN Inference for FPGAs

FPX-NIC: An FPGA-Accelerated 4K Ultra-High-Definition Neural Video Coding System

A Survey of FPGA-Based Neural Network Accelerator

Nuclei: GPU-Accelerated Many-Core Network Coding

Accelerating Mobile Applications at the Network Edge with Software-Programmable FPGAs.

A High-Performance and Flexible Architecture for Accelerating SDN on the MPSoC Platform

A Ubiquitous Machine Learning Accelerator With Automatic Parallelization on FPGA

Multi-dimensional Packet Classification on FPGA: 100 Gbps and Beyond

Toward Full-Stack Acceleration of Deep Convolutional Neural Networks on FPGAs

Software-Hardware Co-design of Heterogeneous SmartNIC System for Recommendation Models Inference and Training

Pflow: An end-to-end heterogeneous acceleration framework for CNN inference on FPGAs

Communication-Aware and Resource-Efficient NoC-Based Architecture for CNN Acceleration