Speeding up the GENGA N-body integrator on consumer-grade graphics cards

R. Brasser,S. L. Grimm,P. Hatalova,J. G. Stadel
2023-09-15
Abstract:GPU computing is popular due to the calculation potential of a single card. The N-body integrator GENGA is built to for this, but it suffers a performance penalty on consumer-grade GPUs due to their truncated double precision (FP64) performance. We aim to speed up GENGA on consumer-grade cards by harvesting their high single-precision performance (FP32). We modified GENGA to be able to compute the long-distance forces between bodies in FP32 precision and tested this with 5 experiments. We ran simulations with similar initial conditions of 6600 planetesimals in both FP32 and FP64 precision. We also ran simulations that i) began with a mixture of planetesimals and planetary embryos, ii) planetesimal-driven giant planet migration, and iii) terrestrial planet formation with a gas disc. Second, we ran the same simulation beginning with 40 000 planetesimals using both FP32 and FP64 precision forces on a variety of consumer-grade and Tesla GPUs to measure the performance boost of FP32 computing. There are no statistical differences when running in FP32 or FP64 precision that can be attributed to the force prescription rather than stochastic effects. The uncertainties in energy are almost identical when using both precisions. However, the uncertainty in the angular momentum using FP32 rather than FP64 precision long-range forces is about two orders of magnitude greater, but still very low. Running the simulations in single precision on consumer-grade cards decreases running time by a factor of three and becomes within a factor of three of a Tesla A100 GPU. Additional tuning speeds up the simulation by a factor of two across all types of cards. The option to compute the long-range forces in single precision in GENGA when using consumer-grade GPUs dramatically improves performance at a little penalty to accuracy. There is an additional environmental benefit because it reduces energy usage.
Earth and Planetary Astrophysics,Instrumentation and Methods for Astrophysics,Distributed, Parallel, and Cluster Computing,Computational Physics
What problem does this paper attempt to address?
The paper aims to address the following key issues: ### Paper Objectives 1. **Improve the performance of the GENGA N-body integrator on consumer-grade GPUs**: - GENGA is a program that uses GPU acceleration for N-body simulations, but it suffers significant performance loss on consumer-grade Nvidia GPUs due to their limited double-precision performance. - The goal of the paper is to speed up GENGA by leveraging the high single-precision performance of consumer-grade GPUs. 2. **Evaluate the impact of single-precision computation on simulation accuracy**: - Explore whether efficiency can be improved by using single-precision computation for long-range forces without significantly sacrificing energy and angular momentum conservation. - Validate the statistical differences between single-precision and double-precision computations through multiple experiments, as well as the accuracy of angular momentum and energy conservation in different scenarios. ### Method Overview - **Modify GENGA to support single-precision computation**: Modify GENGA to compute long-range forces between bodies in single-precision mode. - **Experimental Design**: - Use a large number of simulations with similar initial conditions to compare the statistical differences between single-precision and double-precision results. - Tests include scenarios such as fully self-gravitating planetary debris, mixed planetary embryos and debris, planet-driven giant planet migration, and terrestrial planet formation with dissipative gas disks. - Simulate the same initial conditions on different GPUs (including consumer-grade and Tesla series) to measure the performance improvement from using single-precision for long-range force computation. ### Key Findings - **Small statistical differences**: There are no statistical differences attributable to the force computation method itself between simulations using single-precision or double-precision for long-range forces, only random effects. - **Similar accumulation of energy uncertainty**: The accumulation of energy uncertainty is almost the same whether the simulation runs in single-precision or double-precision. - **Differences in angular momentum conservation**: Although the uncertainty in angular momentum conservation is two orders of magnitude higher in single-precision computation compared to double-precision, it is still generally low. - **Performance improvement**: Running simulations in single-precision on consumer-grade cards can reduce runtime by two-thirds, with performance close to that of the Tesla A100 GPU. - **Additional tuning**: Further tuning can double the simulation speed on all types of cards. ### Conclusion - By allowing the use of single-precision computation for long-range forces on consumer-grade GPUs, GENGA's performance is significantly improved with minimal sacrifice in accuracy. - Additionally, this approach has environmental benefits by reducing energy consumption.