s-Step Orthomin and GMRES implemented on parallel computers

A.T. Chronopoulos,S.K. Kim
DOI: https://doi.org/10.48550/arXiv.2001.04886
2020-01-27
Abstract:The Orthomin ( Omin ) and the Generalized Minimal Residual method ( GMRES ) are commonly used iterative methods for approximating the solution of non-symmetric linear systems. The s-step generalizations of these methods enhance their data locality parallel and properties by forming s simultaneous search direction vectors. Good data locality is the key in achieving near peak rates on memory hierarchical supercomputers. The theoretical derivation of the s-step Arnoldi and Omin has been published in the past. Here we derive the s-step GMRES method. We then implement s-step Omin and GMRES on a Cray-2 hierarchical memory supercomputer.
Numerical Analysis,Distributed, Parallel, and Cluster Computing
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: How to solve asymmetric linear systems more efficiently on parallel computers. Specifically, the author focuses on improving the two iterative methods, Orthomin and GMRES (Generalized Minimal Residual Method), to enhance data locality and parallel computing performance, thereby reducing the number of memory accesses and improving computing efficiency. ### Problem Background In large - scale scientific computing, especially when using supercomputers for numerical simulations, solving large - scale sparse linear systems is a common and crucial task. For the asymmetric linear system \(Ax = f\), although commonly - used iterative methods such as Orthomin and GMRES are effective, they have some limitations in a parallel computing environment, such as frequent memory accesses and synchronization operations, which all affect the computing efficiency. ### Solution To solve these problems, the author proposes the s - step Orthomin and GMRES methods. These methods improve the traditional iterative methods in the following ways: 1. **Enhancing Data Locality**: The s - step method reduces the number of accesses to the main memory by forming multiple simultaneous search - direction vectors, thus improving data locality. 2. **Increasing Parallelism**: The s - step method allows multiple inner products to be calculated simultaneously, reducing the need for global communication and further enhancing the efficiency of parallel computing. 3. **Reducing the Number of Memory Accesses**: By optimizing matrix - vector multiplications and linear combination operations, the s - step method reduces the number of memory accesses to \(\frac{1}{s}\) of that of the standard method. ### Theoretical Derivation and Implementation The author derives the s - step Arnoldi method in detail and, on this basis, derives the s - step GMRES method. Then, they implement these methods on the Cray - 2 hierarchical - memory supercomputer and conduct numerical tests to verify their effectiveness. ### Numerical Experiment Results Through a series of numerical experiments, the author shows the convergence and execution time of the s - step method on grids of different scales. The results show that the s - step method significantly reduces the execution time while maintaining convergence characteristics similar to those of the standard method. Especially after using the BLAS3 module and local - memory optimization, the performance improvement is more obvious. ### Conclusion In general, this paper solves the efficiency problem of asymmetric linear systems in parallel computing by proposing the s - step Orthomin and GMRES methods, especially on hierarchical - memory supercomputers. These methods not only improve data locality but also enhance the ability of parallel computing, providing a valuable reference for future high - performance computing. ### Formula Summary - **Linear System**: \(Ax = f\) - **Initial Residual**: \(r_0=b - Ax_0\) - **Krylov Subspace**: \(K_m = \text{span}\{r_0, Ar_0,\dots, A^{m - 1}r_0\}\) - **Minimizing Error Function**: \(\min_{x\in x_0 + K_m}\|b - Ax\|_2\) - **GMRES Least - Squares Problem**: \(\min_{z\in K_j}\|f - A[x_0 + z]\|=\min_{z\in K_j}\|r_0 - Az\|\) Hopefully, this summary can help you understand the core content of the paper and the problems it solves. If you have more questions or need further explanations, please feel free to let me know!