Abstract:This thesis is concerned with two classical topics in matrix computations : The QR algorithm for solving nonsymmetric eigenvalue problems and the computation of matrix exponentials for two types of structured matrices. We focus on the performance in the former topic and on accuracy in the latter one. For computing all eigenvalues of a non-Hermitian matrix, the QR algorithm which iteratively computes a Schur decomposition of the matrix is the method of choice. We present a new parallel implementation of the multishift QR algorithm targeting distributed memory architectures. Starting from recent developments of the parallel multishift QR algorithm, we propose a number of algorithmic and implementation improvements. Guidelines concerning several important tunable algorithmic parameters are also provided. Numerous computational experiments confirm that our new implementation significantly outperforms previous parallel implementations of the QR algorithm. The computation of the exponential of a square matrix is also an important task in matrix computations. For a general dense matrix, the scaling and squaring method coupled with Pade approximation is the most popular approach. However, for an essentially nonnegative matrix (a real square matrix with nonnegative off-diagonal entries), truncated Taylor series rather than Pade approximation is preferred to achieve componentwise accuracy in the matrix exponential. We propose a method which efficiently computes all entries of the exponential of an essentially nonnegative matrix to high relative accuracy. Truncation and rounding error bounds, as well as numerical experiments demonstrate the efficiency and accuracy of our method. When the matrix is banded, the entries of its matrix exponential decay exponentially away from the main diagonal. We analyze the decay property for the exponentials of several classes of doubly-infinite skew-Hermitian matrices. Then finite section methods based on the decay property are established. We also propose a repeated doubling strategy which works well even when a priori error estimates are pessimistic or not easy to compute. Finally, numerical experiments are presented to illustrate the effectiveness of the finite section method.

A 3D Parallel Algorithm for QR Decomposition

Implementing Communication-Optimal Parallel and Sequential QR Factorizations

Parallel Tiled QR Factorization for Multicore Architectures

Fast Moving Window Algorithm for QR and Cholesky Decompositions

Research on Multi-Level Parallel Algorithm of GPU Based QR Decomposition

On aggressive early deflation in parallel variants of the QR algorithm

A Novel Architecture to Eliminate Bottlenecks in a Parallel Tiled QRD Algorithm for Future MIMO Systems.

Communication-optimal parallel and sequential QR and LU factorizations: theory and practice

Parallel Row Operation Algorithm for Banded Linear Systems

QR Decomposition Architecture Using the Iteration Look-Ahead Modified Gram-Schmidt Algorithm.

A Parallel Iterative Method for Solving Periodical Block-Tridiagonal Linear Equations.

CholeskyQR with Randomization and Pivoting for Tall Matrices (CQRRPT)

Algorithm 953

A Communication-Avoiding Parallel Algorithm for the Symmetric Eigenvalue Problem

A Unified Co-Processor Architecture for Matrix Decomposition.

Adaptive Parallelizable Algorithms for Interpolative Decompositions via Partially Pivoted LU

Algorithm 1019: A Task-based Multi-shift QR/QZ Algorithm with Aggressive Early Deflation

Coded Computing for Fault-Tolerant Parallel QR Decomposition

Modelling the Runtime of the IQMR Method for Large and Sparse Linear Systems on Parallel Computers

Dense and Structured Matrix Computations —the Parallel QR Algorithm and Matrix Exponentials

QR factorization of ill-conditioned tall-and-skinny matrices on distributed-memory systems