Abstract:This thesis is concerned with two classical topics in matrix computations : The QR algorithm for solving nonsymmetric eigenvalue problems and the computation of matrix exponentials for two types of structured matrices. We focus on the performance in the former topic and on accuracy in the latter one. For computing all eigenvalues of a non-Hermitian matrix, the QR algorithm which iteratively computes a Schur decomposition of the matrix is the method of choice. We present a new parallel implementation of the multishift QR algorithm targeting distributed memory architectures. Starting from recent developments of the parallel multishift QR algorithm, we propose a number of algorithmic and implementation improvements. Guidelines concerning several important tunable algorithmic parameters are also provided. Numerous computational experiments confirm that our new implementation significantly outperforms previous parallel implementations of the QR algorithm. The computation of the exponential of a square matrix is also an important task in matrix computations. For a general dense matrix, the scaling and squaring method coupled with Pade approximation is the most popular approach. However, for an essentially nonnegative matrix (a real square matrix with nonnegative off-diagonal entries), truncated Taylor series rather than Pade approximation is preferred to achieve componentwise accuracy in the matrix exponential. We propose a method which efficiently computes all entries of the exponential of an essentially nonnegative matrix to high relative accuracy. Truncation and rounding error bounds, as well as numerical experiments demonstrate the efficiency and accuracy of our method. When the matrix is banded, the entries of its matrix exponential decay exponentially away from the main diagonal. We analyze the decay property for the exponentials of several classes of doubly-infinite skew-Hermitian matrices. Then finite section methods based on the decay property are established. We also propose a repeated doubling strategy which works well even when a priori error estimates are pessimistic or not easy to compute. Finally, numerical experiments are presented to illustrate the effectiveness of the finite section method.

On aggressive early deflation in parallel variants of the QR algorithm

Algorithm 953

Algorithm 953: Parallel Library Software for the Multishift QR Algorithm with Aggressive Early Deflation.

Algorithm 1019: A Task-based Multi-shift QR/QZ Algorithm with Aggressive Early Deflation

Fast Moving Window Algorithm for QR and Cholesky Decompositions

Improving data locality of the nonsymmetric QR algorithm

A 3D Parallel Algorithm for QR Decomposition

Implementing Communication-Optimal Parallel and Sequential QR Factorizations

Dense and Structured Matrix Computations —the Parallel QR Algorithm and Matrix Exponentials

Parallel Tiled QR Factorization for Multicore Architectures

A Quadratically Convergent QR-like Method Without Shifts for the Hermitian Eigenvalue Problem

Global Convergence of Hessenberg Shifted QR I: Exact Arithmetic

A Communication-Avoiding Parallel Algorithm for the Symmetric Eigenvalue Problem

Block Size Selection of Parallel LU and QR on PVP-based and RISC-based Supercomputers

Research on Multi-Level Parallel Algorithm of GPU Based QR Decomposition

CholeskyQR with Randomization and Pivoting for Tall Matrices (CQRRPT)

Revisiting the performance optimization of QR factorization on Intel KNL and SKL multiprocessors

Communication-optimal parallel and sequential QR and LU factorizations: theory and practice

QR factorization of ill-conditioned tall-and-skinny matrices on distributed-memory systems

Adaptive Parallelizable Algorithms for Interpolative Decompositions via Partially Pivoted LU

Analysis of Randomized Householder-Cholesky QR Factorization with Multisketching