What problem does this paper attempt to address?

The problem that this paper attempts to solve is: How to solve asymmetric linear systems more efficiently on parallel computers. Specifically, the author focuses on improving the two iterative methods, Orthomin and GMRES (Generalized Minimal Residual Method), to enhance data locality and parallel computing performance, thereby reducing the number of memory accesses and improving computing efficiency. ### Problem Background In large - scale scientific computing, especially when using supercomputers for numerical simulations, solving large - scale sparse linear systems is a common and crucial task. For the asymmetric linear system \(Ax = f\), although commonly - used iterative methods such as Orthomin and GMRES are effective, they have some limitations in a parallel computing environment, such as frequent memory accesses and synchronization operations, which all affect the computing efficiency. ### Solution To solve these problems, the author proposes the s - step Orthomin and GMRES methods. These methods improve the traditional iterative methods in the following ways: 1. **Enhancing Data Locality**: The s - step method reduces the number of accesses to the main memory by forming multiple simultaneous search - direction vectors, thus improving data locality. 2. **Increasing Parallelism**: The s - step method allows multiple inner products to be calculated simultaneously, reducing the need for global communication and further enhancing the efficiency of parallel computing. 3. **Reducing the Number of Memory Accesses**: By optimizing matrix - vector multiplications and linear combination operations, the s - step method reduces the number of memory accesses to \(\frac{1}{s}\) of that of the standard method. ### Theoretical Derivation and Implementation The author derives the s - step Arnoldi method in detail and, on this basis, derives the s - step GMRES method. Then, they implement these methods on the Cray - 2 hierarchical - memory supercomputer and conduct numerical tests to verify their effectiveness. ### Numerical Experiment Results Through a series of numerical experiments, the author shows the convergence and execution time of the s - step method on grids of different scales. The results show that the s - step method significantly reduces the execution time while maintaining convergence characteristics similar to those of the standard method. Especially after using the BLAS3 module and local - memory optimization, the performance improvement is more obvious. ### Conclusion In general, this paper solves the efficiency problem of asymmetric linear systems in parallel computing by proposing the s - step Orthomin and GMRES methods, especially on hierarchical - memory supercomputers. These methods not only improve data locality but also enhance the ability of parallel computing, providing a valuable reference for future high - performance computing. ### Formula Summary - **Linear System**: \(Ax = f\) - **Initial Residual**: \(r_0=b - Ax_0\) - **Krylov Subspace**: \(K_m = \text{span}\{r_0, Ar_0,\dots, A^{m - 1}r_0\}\) - **Minimizing Error Function**: \(\min_{x\in x_0 + K_m}\|b - Ax\|_2\) - **GMRES Least - Squares Problem**: \(\min_{z\in K_j}\|f - A[x_0 + z]\|=\min_{z\in K_j}\|r_0 - Az\|\) Hopefully, this summary can help you understand the core content of the paper and the problems it solves. If you have more questions or need further explanations, please feel free to let me know!

s-Step Orthomin and GMRES implemented on parallel computers

A numerically stable communication-avoiding s-step GMRES algorithm

Two-Stage Block Orthogonalization to Improve Performance of $s$-step GMRES

Parallel Computation Of Meshfree Methods For Extremely Large Deformation Analysis

On the backward stability of s-step GMRES

A Parallel Function Evaluation Approach for Solution to Large-Scale Equation-Oriented Models.

Improving the Performance of the GMRES Method using Mixed-Precision Techniques

Utilizing Cuda For Preconditioned Gmres Solvers

A Study of Mixed Precision Strategies for GMRES on GPUs

Parallel Computation Method For Solving Large Scale Equation-Oriented Models

A spectrally preconditioned and initially deflated variant of the restarted block GMRES method for solving multiple right-hand sides linear systems

Compressed Basis GMRES on High Performance GPUs

GMRES with randomized sketching and deflated restarting

A Nonlinear GMRES Optimization Algorithm for Canonical Tensor Decomposition

Preprocessed GMRES for fast solution of linear equations

Performance Evaluation of Parallel Gram-Schmidt Re-orthogonalization Methods

Iterative Methods in GPU-Resident Linear Solvers for Nonlinear Constrained Optimization

Block GMRES Method with Inexact Breakdowns and Deflated Restarting

Achieving high performance and portable parallel GMRES algorithm for compressible flow simulations on unstructured grids

Optimal Solutions of Well-Posed Linear Systems via Low-Precision Right-Preconditioned GMRES with Forward and Backward Stabilization

On IGMRES: an Incomplete Generalized Minimal Residual Method for Large Unsymmetric Linear Systems