Abstract:Computing demands for large scientific experiments, such as the CMS experiment at the CERN LHC, will increase dramatically in the next decades. To complement the future performance increases of software running on central processing units (CPUs), explorations of coprocessor usage in data processing hold great potential and interest. Coprocessors are a class of computer processors that supplement CPUs, often improving the execution of certain functions due to architectural design choices. We explore the approach of Services for Optimized Network Inference on Coprocessors (SONIC) and study the deployment of this as-a-service approach in large-scale data processing. In the studies, we take a data processing workflow of the CMS experiment and run the main workflow on CPUs, while offloading several machine learning (ML) inference tasks onto either remote or local coprocessors, specifically graphics processing units (GPUs). With experiments performed at Google Cloud, the Purdue Tier-2 computing center, and combinations of the two, we demonstrate the acceleration of these ML algorithms individually on coprocessors and the corresponding throughput improvement for the entire workflow. This approach can be easily generalized to different types of coprocessors and deployed on local CPUs without decreasing the throughput performance. We emphasize that the SONIC approach enables high coprocessor usage and enables the portability to run workflows on different types of coprocessors.

What problem does this paper attempt to address?

The paper attempts to address the issue of the sharply increasing computational demands faced by large scientific experiments (such as the CMS experiment at CERN's LHC) in the coming decades. Specifically, as data acquisition rates and event complexity continue to rise, the limited performance improvements of existing central processing units (CPUs) are insufficient to meet future computational needs. Therefore, the paper explores the use of coprocessors (such as graphics processing units (GPUs), field-programmable gate arrays (FPGAs), etc.) as a service (IaaS) to accelerate data processing workflows. ### Main Issues: 1. **Growth in Computational Demand**: The CMS experiment will face a significant increase in data volume in future physics runs, posing enormous computational challenges for data processing. 2. **Limited CPU Performance Improvement**: Although future software performance on CPUs will improve, this improvement is limited and cannot fully meet the growing computational demands. 3. **Potential of Coprocessors**: Due to their architectural design, coprocessors can significantly enhance the execution efficiency of certain computational tasks, especially machine learning (ML) inference tasks. ### Solution: - **SONIC Method**: The paper proposes the "Services for Optimized Network Inference on Coprocessors" (SONIC) method, which accelerates data processing workflows by offloading ML inference tasks to remote or local coprocessors. - **IaaS Framework**: Through network calls, CPU clients can send computational requests to coprocessor servers, effectively utilizing heterogeneous computing resources. - **Flexibility and Portability**: The SONIC method supports different types of coprocessors and can also run on local CPUs without reducing throughput performance. ### Goals: - **Accelerate ML Algorithms**: Significantly increase the execution speed of ML inference tasks by offloading them to coprocessors. - **Optimize Overall Workflow**: Improve the throughput of the entire data processing workflow, ensuring efficient use of computational resources in large-scale production environments. - **Scalability and Adaptability**: Ensure that the method can be easily scaled to different types and numbers of coprocessors and can be deployed in various computing environments. ### Experimental Validation: - **Experimental Environment**: The paper conducted experiments in Google Cloud, Purdue Tier-2 computing center, and a combined environment of both. - **Performance Evaluation**: Demonstrated the acceleration effect of ML algorithms on coprocessors and the improvement in overall workflow throughput. Through this research, the paper aims to provide an efficient, flexible, and scalable solution for data processing in future high-energy physics experiments.

Portable acceleration of CMS computing workflows with coprocessors as a service

CMSSW Scaling Limits on Many-Core Machines

GPU coprocessors as a service for deep learning inference in high energy physics

Accelerating Scientific Computing in the Post-Moore’s Era

Efficient Algorithms for Monte Carlo Particle Transport on AI Accelerator Hardware

Enabling CMS Experiment to the utilization of multiple hardware architectures: a Power9 Testbed at CINECA

Utilizing Multiple Xeon Phi Coprocessors on One Compute Node.

FPGA-accelerated machine learning inference as a service for particle physics computing

CAP: Communication-aware Automated Parallelization for Deep Learning Inference on CMP Architectures

Bringing high-performance computing to the biologist's workbench: approaches, applications, and challenges

Evaluating Portable Parallelization Strategies for Heterogeneous Architectures in High Energy Physics

Effective GPU Sharing Under Compiler Guidance

The integration of heterogeneous resources in the CMS Submission Infrastructure for the LHC Run 3 and beyond

Performance and Power Efficient Massive Parallel Computational Model for HPC Heterogeneous Exascale Systems

Performance on HPC Platforms Is Possible Without C++

LHC: A Low-Power Heterogeneous Computing Method on Neural Network Accelerator

Accelerating Large Scale Image Analyses on Parallel, CPU-GPU Equipped Systems

Experience and Analysis of Scalable High-Fidelity Computational Fluid Dynamics on Modular Supercomputing Architectures

Is Disaggregation possible for HPC Cognitive Simulation?

Optimizing Offload Performance in Heterogeneous MPSoCs

MIC acceleration of short-range molecular dynamics simulations