A Left-Looking Selected Inversion Algorithm and Task Parallelism on Shared Memory Systems.

Mathias Jacquelin,Lin,Weile Jia,Yonghua Zhao,Chao Yang
DOI: https://doi.org/10.1145/3149457.3149472
2016-01-01
Abstract:Given a sparse matrix A, the selected inversion algorithm is an efficient method for computing certain selected elements of A-1. These selected elements correspond to all or some nonzero elements of the LU factors of A. In many ways, the types of matrix updates performed in the selected inversion algorithm are similar to those performed in the LU factorization, although the sequence of operations is different. In the context of LU factorization, it is known that the leftlooking and right-looking algorithms exhibit different memory access and data communication patterns, and hence different behavior on shared memory and distributed memory parallel machines. Corresponding to right-looking and left-looking LU factorization, the selected inversion algorithm can be organized as a left-looking or a right-looking algorithm. The parallel right-looking version of the algorithm has been developed in [9]. The sequence of operations performed in this version of the selected inversion algorithm is similar to those performed in a left-looking LU factorization algorithm. In this paper, we describe the left-looking variant of the selected inversion algorithm, and present an efficient implementation of the algorithm for shared memory machines using a task parallel method. We demonstrate that with the task scheduling features provided by OpenMP 4.0, the left-looking selected inversion algorithm can scale well both on the Intel Haswell multicore architecture and on the Intel Knights Landing (KNL) manycore architecture up to 16 and 64 cores, respectively. On the KNL architecture, we observe that the maximum parallel efficiency achieved by the left-looking selected inversion algorithm can be as high as 62% even when all 64 cores are used, despite the inherent asynchronous nature of the computation and communication patterns in sparse matrix operations. Compared to the right-looking selected inversion algorithm, the left-looking formulation facilitates efficient pipelining of operations along different branches of the elimination tree, and can be a promising candidate for future development of massively parallel selected inversion algorithms on heterogeneous architectures.
What problem does this paper attempt to address?