Auto‐Tuning Mixed‐Precision Computation by Specifying Multiple Regions
Xuanzhengbo Ren,Masatoshi Kawai,Tetsuya Hoshino,Takahiro Katagiri,Toru Nagai
DOI: https://doi.org/10.1002/cpe.8326
2024-11-08
Concurrency and Computation Practice and Experience
Abstract:Mixed‐precision computation is a promising method for substantially improving high‐performance computing applications. However, using mixed‐precision data is a double‐edged sword. While it can improve computational performance, the reduction in precision introduces more uncertainties and errors. As a result, precision tuning is necessary to determine the optimal mixed‐precision configurations. Much effort is therefore spent on selecting appropriate variables while balancing execution time and numerical accuracy. Auto‐tuning (AT) is one of the technologies that can assist in alleviating this intensive task. In recent years, ppOpen‐AT, an AT language, introduced a directive for mixed‐precision tuning called "Blocks." In this study, we investigated an AT strategy for the "Blocks" directive for multi‐region tuning of a program. The non‐hydrostatic icosahedral atmospheric model (NICAM), a global cloud‐resolving model, was used as a benchmark program to evaluate the effectiveness of the AT strategy. Experimental results indicated that when a single region of the program performed well in mixed‐precision computation, combining these regions resulted in better performance. When tested on the supercomputer "Flow" Type I (Fujitsu PRIMEHPC FX1000) and Type II (Fujitsu PRIMEHPC CX1000) subsystems, the mixed‐precision NICAM benchmark program tuned by the AT strategy achieved a speedup of nearly 1.31× on the Type I subsystem compared to the original double‐precision program, and a 1.12× speedup on the Type II subsystem.
computer science, theory & methods, software engineering