Parallel execution time analysis for least squares problems on distributed memory architectures

Laurence Tianruo Yang,Richard P. Brent
2001-01-01
Abstract:In this paper we study the parallelization of PCGLS, a basic iterative method which main idea is to organize the computation of conjugate gradient method with preconditioner applied to normal equations. Two important schemes are discussed. What is the best possible data distribution and which communication network topology is most suitable for solving least squares problems on massively parallel distributed memory computers. A theoretical model of data distribution and communication phases is presented which allows us to give a detail execution time complexity analysis and to investigate its usefulness. It is shown that the implementation of PCGLS, with a row-block decomposition of the coefficient matrix, on a ring of communication structure is the most efficient choice. Performance tests of the developed parallel PCGLS algorithm have been carried out on the massively distributed memory system Parsytec and experimental timing results are compared with the theoretical execution time complexity analysis.
What problem does this paper attempt to address?