Low-synchronization Arnoldi Methods for the Matrix Exponential with Application to Exponential Integrators

Tanya Tafolla,Stéphane Gaudreault,Mayya Tokman
2024-10-19
Abstract:High order exponential integrators require computing linear combination of exponential like $\varphi$-functions of large matrices $A$ times a vector $v$. Krylov projection methods are the most general and remain an efficient choice for computing the matrix-function-vector-product evaluation when the matrix is $A$ is large and unable to be explicitly stored, or when obtaining information about the spectrum is expensive. The Krylov approximation relies on the Gram-Schmidt (GS) orthogonalization procedure to produce the orthonormal basis $V_m$. In parallel, GS orthogonalization requires \textit{global synchronizations} for inner products and vector normalization in the orthogonalization process. Reducing the amount of global synchronizations is of paramount importance for the efficiency of a numerical algorithm in a massively parallel setting. We improve the parallel strong scaling properties of exponential integrators by addressing the underlying bottleneck in the linear algebra using low-synchronization GS methods. The resulting orthogonalization algorithms have an accuracy comparable to modified Gram-Schmidt yet are better suited for distributed architecture, as only one global communication is required per orthogonalization-step. We present geophysics-based numerical experiments and standard examples routinely used to test stiff time integrators, which validate that reducing global communication leads to better parallel scalability and reduced time-to-solution for exponential integrators.
Numerical Analysis
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is to improve the efficiency of matrix exponential calculation in large - scale parallel computing environments, especially for high - order exponential integrators of stiff systems. Specifically, the authors focus on reducing the need for global synchronization in the Arnoldi method, thereby improving parallel strong scalability. ### Problem Background 1. **Applications of Exponential Integrators** Exponential integrators are an efficient numerical method for solving large - scale stiff systems. Such systems usually appear in scientific and engineering fields, such as the time integration of partial differential equations (PDEs). Since stiff systems span a wide range of time scales, large time steps are required to ensure the stability and efficiency of the calculation. 2. **Bottlenecks of Krylov Subspace Methods** Krylov subspace methods are an effective choice for calculating matrix - function - vector products (such as \( \phi \)-functions), especially when dealing with large - scale sparse matrices. However, the traditional Arnoldi iteration depends on the Gram - Schmidt orthogonalization process, which requires a large amount of global communication in parallel computing and becomes a performance bottleneck. 3. **Impact of Global Synchronization** In distributed - memory parallel systems, global synchronization (such as the Allreduce operation in MPI) will lead to significant communication overhead, especially when the number of processors increases. Reducing the number of global synchronizations is crucial for improving the parallel scalability of the algorithm. ### Solution The authors propose a low - synchronization Arnoldi method, aiming to reduce the global communication requirements by improving the orthogonalization process. Specific measures include: - **Low - Synchronization Gram - Schmidt Orthogonalization**: By redesigning the orthogonalization process, the number of global synchronizations in each iteration is reduced from \( j + 1 \) to one. - **Hybrid Low - Synchronization Algorithm**: Combining techniques such as projection onto the orthogonal complement space, delayed normalization, and norm estimation to ensure the stability and accuracy of the algorithm. - **Numerical Experiment Verification**: Through a series of numerical experiments, verify the advantages of the new method in parallel scalability and computational efficiency, especially when using a large number of processors. ### Paper Contributions This paper shows how to significantly improve the performance of exponential integrators in large - scale parallel computing by optimizing the orthogonalization process in Krylov subspace methods. This not only improves the parallel scalability of the algorithm but also reduces the computation time, providing a more efficient method for solving complex stiff systems. ### Summary The core objective of this paper is to improve the efficiency and scalability of exponential integrators in large - scale parallel computing environments by reducing the need for global synchronization in the Arnoldi method. By introducing a low - synchronization orthogonalization algorithm, the authors have successfully solved the communication bottleneck problem in traditional methods, providing new ideas and technical means for the efficient numerical solution of stiff systems.