Enabling mixed-precision with the help of tools: A Nekbone case study

Yanxiang Chen,Pablo de Oliveira Castro,Paolo Bientinesi,Roman Iakymchuk

2024-05-18

Abstract:Mixed-precision computing has the potential to significantly reduce the cost of exascale computations, but determining when and how to implement it in programs can be challenging. In this article, we consider Nekbone, a mini-application for the CFD solver Nek5000, as a case study, and propose a methodology for enabling mixed-precision with the help of computer arithmetic tools and roofline model. We evaluate the derived mixed-precision program by combining metrics in three dimensions: accuracy, time-to-solution, and energy-to-solution. Notably, the introduction of mixed-precision in Nekbone, reducing time-to-solution by 40.7% and energy-to-solution by 47% on 128 MPI ranks.

Mathematical Software,Distributed, Parallel, and Cluster Computing,Software Engineering

What problem does this paper attempt to address?

The paper primarily addresses the issue of effectively utilizing mixed-precision computation in scientific computing applications, particularly in high-performance computing (HPC) environments. Specifically, the paper studies Nekbone, a mini-application for the fluid dynamics solver Nek5000. The goal of the paper is to propose a methodology to evaluate and implement mixed-precision computation to reduce computational costs, improve time efficiency, and lower energy consumption. The main contributions of the paper include: 1. Introducing a tool-assisted approach that enables application developers to use computer arithmetic tools to evaluate and optimize the precision requirements in floating-point operations. 2. Using the Verificarlo tool to analyze Nekbone and identify potential parts where precision trimming can be applied. Additionally, Monte Carlo arithmetic was used to simulate fluctuations in floating-point operations and assess the accuracy of reduced-precision computations. 3. In two typical examples, a careful mix of double-precision and single-precision computations allowed the solver to run entirely in single precision, resulting in up to 41% reduction in time consumption and 47% reduction in energy consumption. Through the above work, the paper demonstrates how mixed-precision computation can significantly enhance performance and energy efficiency without sacrificing accuracy, which is particularly important for next-generation supercomputers such as exascale computing systems.

Enabling mixed-precision with the help of tools: A Nekbone case study

Mixed-precision finite element kernels and assembly: Rounding error analysis and hardware acceleration

Mixed-Precision In-Memory Computing

A Study of Mixed Precision Strategies for GMRES on GPUs

Auto‐Tuning Mixed‐Precision Computation by Specifying Multiple Regions

Leveraging Mixed Precision in Exponential Time Integration Methods

Accelerating and Tuning Small Matrix Multiplications on Sunway TaihuLight: A Case Study of Spectral Element CFD Code Nek5000

Exploring and Exploiting Runtime Reconfigurable Floating Point Precision in Scientific Computing: a Case Study for Solving PDEs

Sound Mixed-Precision Optimization with Rewriting

HipBone: A performance-portable GPU-accelerated C++ version of the NekBone benchmark

Benchmarking mixed-mode PETSc performance on high-performance architectures

Fast Sound Error Bounds for Mixed-Precision Real Evaluation

Multi-Objective Optimization for Floating Point Mix-Precision Tuning

Precision-Aware Iterative Algorithms Based on Group-Shared Exponents of Floating-Point Numbers

NEAT: A Framework for Automated Exploration of Floating Point Approximations

Mixed precision in Graphics Processing Unit

Speeding up and reducing memory usage for scientific machine learning via mixed precision

Numerical eigen-spectrum slicing, accurate orthogonal eigen-basis, and mixed-precision eigenvalue refinement using OpenMP data-dependent tasks and accelerator offload

Automatic Search Guided Code Optimization Framework for Mixed-Precision Scientific Applications.

Cluster-level tuning of a shallow water equation solver on the Intel MIC architecture