Abstract:ytopt is a Python machine-learning-based autotuning software package developed within the ECP PROTEAS-TUNE project. The ytopt software adopts an asynchronous search framework that consists of sampling a small number of input parameter configurations and progressively fitting a surrogate model over the input-output space until exhausting the user-defined maximum number of evaluations or the wall-clock time. libEnsemble is a Python toolkit for coordinating workflows of asynchronous and dynamic ensembles of calculations across massively parallel resources developed within the ECP PETSc/TAO project. libEnsemble helps users take advantage of massively parallel resources to solve design, decision, and inference problems and expands the class of problems that can benefit from increased parallelism. In this paper we present our methodology and framework to integrate ytopt and libEnsemble to take advantage of massively parallel resources to accelerate the autotuning process. Specifically, we focus on using the proposed framework to autotune the ECP ExaSMR application OpenMC, an open source Monte Carlo particle transport code. OpenMC has seven tunable parameters some of which have large ranges such as the number of particles in-flight, which is in the range of 100,000 to 8 million, with its default setting of 1 million. Setting the proper combination of these parameter values to achieve the best performance is extremely time-consuming. Therefore, we apply the proposed framework to autotune the MPI/OpenMP offload version of OpenMC based on a user-defined metric such as the figure of merit (FoM) (particles/s) or energy efficiency energy-delay product (EDP) on Crusher at Oak Ridge Leadership Computing Facility. The experimental results show that we achieve improvement up to 29.49\% in FoM and up to 30.44\% in EDP.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: how to use the integrated framework of ytopt and libEnsemble to accelerate the automatic parameter - tuning process of OpenMC (an open - source Monte Carlo particle transport code) in order to improve performance and energy efficiency. Specifically, the paper focuses on combining the advantages of these two tools, making full use of large - scale parallel resources, optimizing seven tunable parameters of OpenMC, and thus achieving better performance metrics (such as the figure - of - merit FoM for performance - to - power ratio and the energy - delay product EDP) on the Crusher supercomputer. ### Problem Background 1. **Challenges in High - Performance Computing (HPC) Systems**: - As we enter the exascale computing era, high performance, power, and energy management remain key points and constraints in the design of large - scale high - performance computing systems. - Factors such as dynamic phase behavior, manufacturing variances, and system - level heterogeneity make it very challenging to efficiently use power and optimize scientific applications. 2. **Limitations of Existing Automatic Parameter - Tuning Methods**: - Traditional automatic parameter - tuning methods are based on heuristic rules, which come from auto - tuning BLAS libraries, experience, and model - driven methods. - These methods become difficult to practice when facing complex hardware, software, and applications. - Traditional methods can usually only evaluate one parameter configuration at a time, resulting in a very time - consuming overall parameter - tuning process. ### Solutions 1. **Introduction to ytopt**: - ytopt is a machine - learning - based automatic parameter - tuning software package that uses the Bayesian optimization method to gradually fit surrogate models in the input - output space. - It can effectively explore the parameter space, but it can only evaluate one parameter configuration at a time, which limits its efficiency. 2. **Introduction to libEnsemble**: - libEnsemble is a Python toolkit for coordinating workflows across large - scale parallel resources, supporting asynchronous and dynamic computational collections. - It can help users utilize large - scale parallel resources to solve design, decision - making, and inference problems, and expand the categories of problems that can benefit from increased parallelism. 3. **Integrated Framework ytopt - libe**: - By integrating ytopt with libEnsemble, a new asynchronous automatic parameter - tuning framework ytopt - libe is proposed. - This framework contains two asynchronous aspects: - Asynchrony of search: Avoid waiting for all evaluation results. Once an evaluation is completed, immediately retrain the surrogate model with new data. - Asynchrony of evaluation: Utilize the asynchronous and dynamic manager/worker - node scheme of libEnsemble to simultaneously evaluate multiple selected parameter configurations. 4. **Application Case: OpenMC**: - OpenMC has seven tunable parameters, some of which have a large range (for example, the number of particles in flight ranges from 100,000 to 8 million, with the default setting of 1 million). - Use the ytopt - libe framework to perform automatic parameter - tuning on OpenMC to optimize its performance and energy efficiency. - Experimental results show that using this framework can achieve up to 29.49% improvement in FoM and 30.44% improvement in EDP on the Crusher supercomputer. ### Summary This paper aims to propose a new asynchronous automatic parameter - tuning framework ytopt - libe by integrating ytopt and libEnsemble, in order to accelerate the automatic parameter - tuning process of OpenMC, make full use of large - scale parallel resources, and optimize performance and energy efficiency. Experimental results show that this framework has a significant effect in practical applications.

Integrating ytopt and libEnsemble to Autotune OpenMC

Integrating ytopt and libEnsemble to autotune OpenMC

ytopt: Autotuning Scientific Applications for Energy Efficiency at Large Scales

An Autotuning Protocol to Rapidly Build Autotuners

Efficient Parameter Tuning for a Structure-Based Virtual Screening HPC Application

Parallel computing based parameter auto-tuning algorithm for optimization solvers

Portable, heterogeneous ensemble workflows at scale using libEnsemble

Cost-Effective Methodology for Complex Tuning Searches in HPC: Navigating Interdependencies and Dimensionality

Potentiality of automatic parameter tuning suite available in ACTS track reconstruction software framework

BROOD: Bilevel and Robust Optimization and Outlier Detection for Efficient Tuning of High-Energy Physics Event Generators

MindOpt Tuner: Boost the Performance of Numerical Software by Automatic Parameter Tuning

CESMTuner: An Auto-tuning Framework for the Community Earth System Model

Autotuning Apache TVM-based Scientific Applications Using Bayesian Optimization

Collective Mind: cleaning up the research and experimentation mess in computer engineering using crowdsourcing, big data and machine learning

Auto-tuning capabilities of the ACTS track reconstruction suite

Performance Optimization using Multimodal Modeling and Heterogeneous GNN

Auptimizer -- an Extensible, Open-Source Framework for Hyperparameter Tuning

A Unified and Efficient Coordinating Framework for Autonomous DBMS Tuning

Adapting Multi-objectivized Software Configuration Tuning

Colmena: Scalable Machine-Learning-Based Steering of Ensemble Simulations for High Performance Computing

Hyper-Tune: Towards Efficient Hyper-parameter Tuning at Scale