Abstract:The International Journal of High Performance Computing Applications, Ahead of Print. Ytopt is a Python machine-learning-based autotuning software package developed within the ECP PROTEAS-TUNE project. The ytopt software adopts an asynchronous search framework that consists of sampling a small number of input parameter configurations and progressively fitting a surrogate model over the input-output space until exhausting the user-defined maximum number of evaluations or the wall-clock time. libEnsemble is a Python toolkit for coordinating workflows of asynchronous and dynamic ensembles of calculations across massively parallel resources developed within the ECP PETSc/TAO project. libEnsemble helps users take advantage of massively parallel resources to solve design, decision, and inference problems and expands the class of problems that can benefit from increased parallelism. In this paper we present our methodology and framework to integrate ytopt and libEnsemble to take advantage of massively parallel resources to accelerate the autotuning process. Specifically, we focus on using the proposed framework to autotune the ECP ExaSMR application OpenMC, an open source Monte Carlo particle transport code. OpenMC has seven tunable parameters some of which have large ranges such as the number of particles in-flight, which is in the range of 100,000 to 8 million, with its default setting of 1 million. Setting the proper combination of these parameter values to achieve the best performance is extremely time-consuming. Therefore, we apply the proposed framework to autotune the MPI/OpenMP offload version of OpenMC based on a user-defined metric such as the figure of merit (FoM) (particles/s) or energy efficiency energy-delay product (EDP) on Crusher at Oak Ridge Leadership Computing Facility. The experimental results show that we achieve the improvement up to 29.49% in FoM and up to 30.44% in EDP.

An Autotuning Protocol to Rapidly Build Autotuners

ytopt: Autotuning Scientific Applications for Energy Efficiency at Large Scales

HAOTuner: A Hardware Adaptive Operator Auto-Tuner for Dynamic Shape Tensor Compilers

Fast: A Fast Stencil Autotuning Framework Based On An Optimal-Solution Space Model

Parallel computing based parameter auto-tuning algorithm for optimization solvers

Towards a Benchmarking Suite for Kernel Tuners

Integrating ytopt and libEnsemble to autotune OpenMC

Software Autotuning for Sustainable Performance Portability

FTuner: A Fast Dynamic Shape Tensors Program Auto-Tuner for Deep Learning Compilers

MindOpt Tuner: Boost the Performance of Numerical Software by Automatic Parameter Tuning

GPTuner: A Manual-Reading Database Tuning System via GPT-Guided Bayesian Optimization

Adaptive Auto-Tuning Framework for Global Exploration of Stencil Optimization on GPUs

Compiler Autotuning through Multiple Phase Learning

AntTune: An Efficient Distributed Hyperparameter Optimization System for Large-Scale Data.

A Parallel Bandit-Based Approach for Autotuning FPGA Compilation

Auptimizer -- an Extensible, Open-Source Framework for Hyperparameter Tuning

Compiler Auto-tuning through Multiple Phase Learning

Hyper-Tune: Towards Efficient Hyper-parameter Tuning at Scale

ML$^2$Tuner: Efficient Code Tuning via Multi-Level Machine Learning Models

Enhancing Online Index Tuning with a Learned Tuning Diagnostic.