Abstract:Linear solvers are major computational bottlenecks in a wide range of decision support and optimization computations. The challenges become even more pronounced on heterogeneous hardware, where traditional sparse numerical linear algebra methods are often inefficient. For example, methods for solving ill-conditioned linear systems have relied on conditional branching, which degrades performance on hardware accelerators such as graphical processing units (GPUs). To improve the efficiency of solving ill-conditioned systems, our computational strategy separates computations that are efficient on GPUs from those that need to run on traditional central processing units (CPUs). Our strategy maximizes the reuse of expensive CPU computations. Iterative methods, which thus far have not been broadly used for ill-conditioned linear systems, play an important role in our approach. In particular, we extend ideas from [1] to implement iterative refinement using inexact LU factors and flexible generalized minimal residual (FGMRES), with the aim of efficient performance on GPUs. We focus on solutions that are effective within broader application contexts, and discuss how early performance tests could be improved to be more predictive of the performance in a realistic environment

What problem does this paper attempt to address?

The paper primarily focuses on addressing the computational bottlenecks encountered when solving ill-conditioned linear systems in nonlinear constrained optimization problems, particularly the challenges faced when utilizing heterogeneous hardware accelerators such as Graphics Processing Units (GPUs). Specifically, the paper addresses the following key points: 1. **Computational Bottleneck**: Linear solvers are the main computational bottleneck in a wide range of decision support and optimization computations. Traditional sparse numerical linear algebra methods are often inefficient when solving ill-conditioned linear systems. 2. **Computational Efficiency on Heterogeneous Hardware**: On heterogeneous hardware platforms (such as GPUs), traditional branch-based methods reduce performance, necessitating the development of new strategies to improve the efficiency of solving ill-conditioned systems. 3. **Combination of Direct and Iterative Methods**: The paper proposes a new strategy that combines direct and iterative methods, aiming to maximize the reuse of expensive Central Processing Unit (CPU) computations and isolate the efficient computational parts on the GPU. 4. **Iterative Refinement Method**: Specifically, the paper extends ideas from previous work to implement an iterative refinement method using inexact LU decomposition factors and Flexible Generalized Minimal Residual (FGMRES) to achieve efficient performance on the GPU. 5. **Performance Evaluation in Practical Applications**: The research also focuses on the effectiveness of the solution in a broader application context and discusses how to improve early performance tests to better predict performance in real-world environments. In summary, the main contributions of the paper include a novel iterative refinement method, a sparse linear solver on the GPU, performance analysis of linear solvers on the GPU, and a comparison with existing solvers. These results are expected to significantly enhance the efficiency of solving specific software stacks (such as power system analysis) and standalone test linear systems.

Iterative Methods in GPU-Resident Linear Solvers for Nonlinear Constrained Optimization