Accelerating Machine Learning Inference with GPUs in ProtoDUNE Data Processing

Tejin Cai,Kenneth Herner,Tingjun Yang,Michael Wang,Maria Acosta Flechas,Philip Harris,Burt Holzman,Kevin Pedro,Nhan Tran

DOI: https://doi.org/10.1007/s41781-023-00101-0

2023-10-28

Abstract:We study the performance of a cloud-based GPU-accelerated inference server to speed up event reconstruction in neutrino data batch jobs. Using detector data from the ProtoDUNE experiment and employing the standard DUNE grid job submission tools, we attempt to reprocess the data by running several thousand concurrent grid jobs, a rate we expect to be typical of current and future neutrino physics experiments. We process most of the dataset with the GPU version of our processing algorithm and the remainder with the CPU version for timing comparisons. We find that a 100-GPU cloud-based server is able to easily meet the processing demand, and that using the GPU version of the event processing algorithm is two times faster than processing these data with the CPU version when comparing to the newest CPUs in our sample. The amount of data transferred to the inference server during the GPU runs can overwhelm even the highest-bandwidth network switches, however, unless care is taken to observe network facility limits or otherwise distribute the jobs to multiple sites. We discuss the lessons learned from this processing campaign and several avenues for future improvements.

High Energy Physics - Experiment,Distributed, Parallel, and Cluster Computing,Data Analysis, Statistics and Probability

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to improve the speed and efficiency of event reconstruction by using GPU - accelerated machine - learning inference in large - scale data processing. Specifically, the researchers are concerned with whether using cloud - based GPU - accelerated inference servers can significantly reduce the event reconstruction time when processing data from the ProtoDUNE experiment, especially when facing the future demand for higher data volumes, whether this technology can meet the requirements for rapid processing. In addition, the paper also explores how to reasonably set task concurrency limits in the case of limited network resources to avoid network saturation while maximizing the advantages of GPU acceleration. These challenges are particularly important for current and future neutrino physics experiments because they need to process large amounts of data quickly and efficiently, especially when a supernova explosion within or near the Milky Way is detected, rapid reconstruction of detector trigger records is crucial for providing rapid location information to optical telescopes.

Accelerating Machine Learning Inference with GPUs in ProtoDUNE Data Processing

The Implementation of the Three-Dimensional Unified Gas-Kinetic Wave-Particle Method on Multiple Graphics Processing Units

Optimizing High Throughput Inference on Graph Neural Networks at Shared Computing Facilities with the NVIDIA Triton Inference Server

Fast Muon Simulation in the JUNO Central Detector

GPU coprocessors as a service for deep learning inference in high energy physics

Computing for the DUNE Long-Baseline Neutrino Oscillation Experiment

FPGA-accelerated machine learning inference as a service for particle physics computing

Two Watts is All You Need: Enabling In-Detector Real-Time Machine Learning for Neutrino Telescopes Via Edge Computing

Real-time data processing with GPUs in high energy physics

The Challenges of Participatory Action Research for Health Promotion

Parallelizing Air Shower Simulation for Background Characterization in IceCube

ParvaGPU: Efficient Spatial GPU Sharing for Large-Scale DNN Inference in Cloud Environments

Data Acquisition with GPUs: The DAQ for the Muon $g$-$2$ Experiment at Fermilab

GPU performance in Run3 ALICE online/offline reconstruction

PREBA: A Hardware/Software Co-Design for Multi-Instance GPU based AI Inference Servers

Fast Inference Using Automatic Differentiation and Neural Transport in Astroparticle Physics

Quasi-real-time analysis of dynamic near field scattering data using a graphics processing unit

Inference-optimized AI and high performance computing for gravitational wave detection at scale

Multi-user Co-inference with Batch Processing Capable Edge Server

Very-Large-Scale GPU-Accelerated Nuclear Gradient of Time-Dependent Density Functional Theory with Tamm-Dancoff Approximation and Range-Separated Hybrid Functionals