Integrating ytopt and libEnsemble to Autotune OpenMC

Xingfu Wu,John R. Tramm,Jeffrey Larson,John-Luke Navarro,Prasanna Balaprakash,Brice Videau,Michael Kruse,Paul Hovland,Valerie Taylor,Mary Hall
2024-09-18
Abstract:ytopt is a Python machine-learning-based autotuning software package developed within the ECP PROTEAS-TUNE project. The ytopt software adopts an asynchronous search framework that consists of sampling a small number of input parameter configurations and progressively fitting a surrogate model over the input-output space until exhausting the user-defined maximum number of evaluations or the wall-clock time. libEnsemble is a Python toolkit for coordinating workflows of asynchronous and dynamic ensembles of calculations across massively parallel resources developed within the ECP PETSc/TAO project. libEnsemble helps users take advantage of massively parallel resources to solve design, decision, and inference problems and expands the class of problems that can benefit from increased parallelism. In this paper we present our methodology and framework to integrate ytopt and libEnsemble to take advantage of massively parallel resources to accelerate the autotuning process. Specifically, we focus on using the proposed framework to autotune the ECP ExaSMR application OpenMC, an open source Monte Carlo particle transport code. OpenMC has seven tunable parameters some of which have large ranges such as the number of particles in-flight, which is in the range of 100,000 to 8 million, with its default setting of 1 million. Setting the proper combination of these parameter values to achieve the best performance is extremely time-consuming. Therefore, we apply the proposed framework to autotune the MPI/OpenMP offload version of OpenMC based on a user-defined metric such as the figure of merit (FoM) (particles/s) or energy efficiency energy-delay product (EDP) on Crusher at Oak Ridge Leadership Computing Facility. The experimental results show that we achieve improvement up to 29.49\% in FoM and up to 30.44\% in EDP.
Performance
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: how to use the integrated framework of ytopt and libEnsemble to accelerate the automatic parameter - tuning process of OpenMC (an open - source Monte Carlo particle transport code) in order to improve performance and energy efficiency. Specifically, the paper focuses on combining the advantages of these two tools, making full use of large - scale parallel resources, optimizing seven tunable parameters of OpenMC, and thus achieving better performance metrics (such as the figure - of - merit FoM for performance - to - power ratio and the energy - delay product EDP) on the Crusher supercomputer. ### Problem Background 1. **Challenges in High - Performance Computing (HPC) Systems**: - As we enter the exascale computing era, high performance, power, and energy management remain key points and constraints in the design of large - scale high - performance computing systems. - Factors such as dynamic phase behavior, manufacturing variances, and system - level heterogeneity make it very challenging to efficiently use power and optimize scientific applications. 2. **Limitations of Existing Automatic Parameter - Tuning Methods**: - Traditional automatic parameter - tuning methods are based on heuristic rules, which come from auto - tuning BLAS libraries, experience, and model - driven methods. - These methods become difficult to practice when facing complex hardware, software, and applications. - Traditional methods can usually only evaluate one parameter configuration at a time, resulting in a very time - consuming overall parameter - tuning process. ### Solutions 1. **Introduction to ytopt**: - ytopt is a machine - learning - based automatic parameter - tuning software package that uses the Bayesian optimization method to gradually fit surrogate models in the input - output space. - It can effectively explore the parameter space, but it can only evaluate one parameter configuration at a time, which limits its efficiency. 2. **Introduction to libEnsemble**: - libEnsemble is a Python toolkit for coordinating workflows across large - scale parallel resources, supporting asynchronous and dynamic computational collections. - It can help users utilize large - scale parallel resources to solve design, decision - making, and inference problems, and expand the categories of problems that can benefit from increased parallelism. 3. **Integrated Framework ytopt - libe**: - By integrating ytopt with libEnsemble, a new asynchronous automatic parameter - tuning framework ytopt - libe is proposed. - This framework contains two asynchronous aspects: - Asynchrony of search: Avoid waiting for all evaluation results. Once an evaluation is completed, immediately retrain the surrogate model with new data. - Asynchrony of evaluation: Utilize the asynchronous and dynamic manager/worker - node scheme of libEnsemble to simultaneously evaluate multiple selected parameter configurations. 4. **Application Case: OpenMC**: - OpenMC has seven tunable parameters, some of which have a large range (for example, the number of particles in flight ranges from 100,000 to 8 million, with the default setting of 1 million). - Use the ytopt - libe framework to perform automatic parameter - tuning on OpenMC to optimize its performance and energy efficiency. - Experimental results show that using this framework can achieve up to 29.49% improvement in FoM and 30.44% improvement in EDP on the Crusher supercomputer. ### Summary This paper aims to propose a new asynchronous automatic parameter - tuning framework ytopt - libe by integrating ytopt and libEnsemble, in order to accelerate the automatic parameter - tuning process of OpenMC, make full use of large - scale parallel resources, and optimize performance and energy efficiency. Experimental results show that this framework has a significant effect in practical applications.