Solving unsymmetric sparse systems of linear equations with PARDISO

Olaf Schenk,Klaus Gärtner

DOI: https://doi.org/10.1016/j.future.2003.07.011

IF: 7.307

2004-04-01

Future Generation Computer Systems

Abstract:Supernode partitioning for unsymmetric matrices together with complete block diagonal supernode pivoting and asynchronous computation can achieve high gigaflop rates for parallel sparse LU factorization on shared memory parallel computers. The progress in weighted graph matching algorithms helps to extend these concepts further and unsymmetric prepermutation of rows is used to place large matrix entries on the diagonal. Complete block diagonal supernode pivoting allows dynamical interchanges of columns and rows during the factorization process. The level-3 BLAS efficiency is retained and an advanced two-level left–right looking scheduling scheme results in good speedup on SMP machines. These algorithms have been integrated into the recent unsymmetric version of the PARDISO solver. Experiments demonstrate that a wide set of unsymmetric linear systems can be solved and high performance is consistently achieved for large sparse unsymmetric matrices from real world applications.

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is to improve the efficiency and robustness of solving asymmetric sparse linear equations, especially to achieve efficient parallel sparse LU decomposition on shared - memory multiprocessor architectures. Specifically, the article focuses on the following aspects: 1. **Efficient parallel solution of asymmetric sparse matrices**: The authors propose a new method to deal with the parallel LU decomposition problem of asymmetric sparse matrices, aiming to improve computational performance and reduce the change in dependency relationships caused by partial pivoting. 2. **Improved scalability and robustness**: In order to achieve better scalability and robustness on shared - memory multiprocessor architectures, the article explores how to reduce the need for partial pivoting through complete block diagonal supernode pivoting, thereby improving the stability and efficiency of the algorithm. 3. **Static calculation of task - dependency graphs**: By introducing a method that combines asymmetric row permutations with complete block diagonal supernode pivoting, the task - dependency graph can be calculated under static conditions, thereby simplifying the synchronization requirements in the parallelization process. 4. **Implementation of high - performance computing**: The article describes how to maintain efficiency by using Level - 3 BLAS functions and adopts a two - level left - right looking scheduling scheme to achieve a good speedup ratio. In summary, the core objective of this paper is to develop a parallel direct solver that can solve large - scale asymmetric sparse linear systems efficiently and reliably, and verify its superior performance in practical applications through experiments.

Solving unsymmetric sparse systems of linear equations with PARDISO

PSelInv -- A Distributed Memory Parallel Algorithm for Selected Inversion : the Symmetric Case

Adaptive Parallelizable Algorithms for Interpolative Decompositions via Partially Pivoted LU

Domain Decomposition Based High Performance Parallel Computing

PSelInv - A Distributed Memory Parallel Algorithm for Selected Inversion: the non-symmetric Case

An Experimental Study of Two-Level Schwarz Domain Decomposition Preconditioners on GPUs

On Parallel Solution of Sparse Triangular Linear Systems in CUDA

A Robust Algebraic Domain Decomposition Preconditioner for Sparse Normal Equations

Structured Semidefinite Programming for Recovering Structured Preconditioners

Skew-Symmetric Matrix Decompositions on Shared-Memory Architectures

Deinsum: Practically I/O Optimal Multilinear Algebra

$O(N)$ distributed direct factorization of structured dense matrices using runtime systems

Batched sparse direct solver design and evaluation in SuperLU_DIST

Evaluating Accuracy and Efficiency of HPC Solvers for Sparse Linear Systems with Applications to PDEs

Basker: A Threaded Sparse LU Factorization Utilizing Hierarchical Parallelism and Data Layouts

Caveats of three direct linear solvers for finite element analyses

PSCToolkit: solving sparse linear systems with a large number of GPUs

A sparse-sparse iteration for computing a sparse incomplete factorization of the inverse of an SPD matrix

Parallel computing studies of flexible multibody system dynamics using OpenMP and Pardiso

On the Parallel I/O Optimality of Linear Algebra Kernels: Near-Optimal Matrix Factorizations

A Two-level GPU-Accelerated Incomplete LU Preconditioner for General Sparse Linear Systems