Parallel least squares problems on massively distributed memory computers

Babak Falsafi,Samuel Midkiff,JackB Dennis,Amol Ghoting,Roy H Campbell,Christof Klausecker,Dieter Kranzlmüller,Joel Emer,Tryggve Fossum,Burton Smith,Bernard Philippe,Ahmed Sameh,François Irigoin,Paul Feautrier,Christoph von Praun,Robert L. Bocchino,Marc Snir,Thomas George,Vivek Sarin,Joefon Jann
1996-01-01
Abstract:In this paper we study the parallel aspects of PCGLS, a basic iterative method whose main idea is to organize the computation of conjugate gradient method with pre-conditioner applied to normal equations, and Incomplete Modiied Gram-Schmidt pre-conditioner for solving least squares problems on massively parallel distributed memory computers. The performance of these methods on this kind of architecture is always limited because of the global communication required for the inner products. We will describe the parallelization of PCGLS and IMGS preconditioner by two ways of improvement. One is to assemble the results of a number of inner products collectively and the other is to create situations where communication can be overlapped with computation. A theoretical model of computation and communication phases is presented which allows us to decide the number of processors that minimizes the runtime. Several numerical experiments on Parsytec GC/PowerPlus are presented.
What problem does this paper attempt to address?