Efficient N-to-M Checkpointing Algorithm for Finite Element Simulations

David A. Ham,Vaclav Hapla,Matthew G. Knepley,Lawrence Mitchell,Koki Sagiyama
2024-10-31
Abstract:In this work, we introduce a new algorithm for N-to-M checkpointing in finite element simulations. This new algorithm allows efficient saving/loading of functions representing physical quantities associated with the mesh representing the physical domain. Specifically, the algorithm allows for using different numbers of parallel processes for saving and loading, allowing for restarting and post-processing on the process count appropriate to the given phase of the simulation and other conditions. For demonstration, we implemented this algorithm in PETSc, the Portable, Extensible Toolkit for Scientific Computation, and added a convenient high-level interface into Firedrake, a system for solving partial differential equations using finite element methods. We evaluated our new implementation by saving and loading data involving 8.2 billion finite element degrees of freedom using 8,192 parallel processes on ARCHER2, the UK National Supercomputing Service.
Distributed, Parallel, and Cluster Computing,Mathematical Software
What problem does this paper attempt to address?