Abstract:The International Journal of High Performance Computing Applications, Ahead of Print. Ytopt is a Python machine-learning-based autotuning software package developed within the ECP PROTEAS-TUNE project. The ytopt software adopts an asynchronous search framework that consists of sampling a small number of input parameter configurations and progressively fitting a surrogate model over the input-output space until exhausting the user-defined maximum number of evaluations or the wall-clock time. libEnsemble is a Python toolkit for coordinating workflows of asynchronous and dynamic ensembles of calculations across massively parallel resources developed within the ECP PETSc/TAO project. libEnsemble helps users take advantage of massively parallel resources to solve design, decision, and inference problems and expands the class of problems that can benefit from increased parallelism. In this paper we present our methodology and framework to integrate ytopt and libEnsemble to take advantage of massively parallel resources to accelerate the autotuning process. Specifically, we focus on using the proposed framework to autotune the ECP ExaSMR application OpenMC, an open source Monte Carlo particle transport code. OpenMC has seven tunable parameters some of which have large ranges such as the number of particles in-flight, which is in the range of 100,000 to 8 million, with its default setting of 1 million. Setting the proper combination of these parameter values to achieve the best performance is extremely time-consuming. Therefore, we apply the proposed framework to autotune the MPI/OpenMP offload version of OpenMC based on a user-defined metric such as the figure of merit (FoM) (particles/s) or energy efficiency energy-delay product (EDP) on Crusher at Oak Ridge Leadership Computing Facility. The experimental results show that we achieve the improvement up to 29.49% in FoM and up to 30.44% in EDP.

Auto‐Tuning Mixed‐Precision Computation by Specifying Multiple Regions

Automatic Search Guided Code Optimization Framework for Mixed-Precision Scientific Applications.

Automatically Tuned Dynamic Programming with an Algorithm-by-Blocks

Mixed Precision Block-Jacobi Preconditioner: Algorithms, Performance Evaluation and Feature Analysis

Multi-Objective Optimization for Floating Point Mix-Precision Tuning

ytopt: Autotuning Scientific Applications for Energy Efficiency at Large Scales

Enabling mixed-precision with the help of tools: A Nekbone case study

A Numerical Model Oriented Large-scale Parallel I/O Optimization Method.

Cost-Effective Methodology for Complex Tuning Searches in HPC: Navigating Interdependencies and Dimensionality

Sound Mixed-Precision Optimization with Rewriting

TROPHY: Trust Region Optimization Using a Precision Hierarchy

Mixed-precision Methods to Reconstruct Numerical Ocean Simulations

Exploring and Exploiting Runtime Reconfigurable Floating Point Precision in Scientific Computing: a Case Study for Solving PDEs

APMT: an Automatic Hardware Counter-Based Performance Modeling Tool for HPC Applications

Solving the Global Atmospheric Equations Through Heterogeneous Reconfigurable Platforms.

Solving global shallow water equations on heterogeneous supercomputers

Integrating ytopt and libEnsemble to autotune OpenMC

Mixed-Precision Computing in the GRIST Dynamical Core for Weather and Climate Modelling

Automatic Multi-Parameter Performance Modeling of HPC Applications on a New Sunway Supercomputer

Tuning Technique for Multiple Precision Dense Matrix Multiplication using Prediction of Computational Time