A 3D Parallel Algorithm for QR Decomposition

Grey Ballard,James Demmel,Laura Grigori,Mathias Jacquelin,Nicholas Knight
DOI: https://doi.org/10.48550/arXiv.1805.05278
2018-05-15
Abstract:Interprocessor communication often dominates the runtime of large matrix computations. We present a parallel algorithm for computing QR decompositions whose bandwidth cost (communication volume) can be decreased at the cost of increasing its latency cost (number of messages). By varying a parameter to navigate the bandwidth/latency tradeoff, we can tune this algorithm for machines with different communication costs.
Distributed, Parallel, and Cluster Computing
What problem does this paper attempt to address?
The problem that this paper attempts to solve is that in large - scale matrix calculations, the communication time between processors often dominates the running time. For this reason, the author proposes a parallel algorithm for computing QR decomposition, which can be adjusted by parameters to make a trade - off between bandwidth cost (traffic) and latency cost (number of messages). Specifically, the goals of the paper are: 1. **Reduce communication bandwidth cost**: By designing a new 3D parallel algorithm, reduce the amount of data transfer between processors, thereby reducing the bandwidth cost. 2. **Increase latency cost in exchange for a reduction in bandwidth cost**: By increasing the number of messages to reduce the amount of data transferred per message, thereby achieving a reduction in bandwidth cost. 3. **Adapt to machines with different communication costs**: By adjusting a parameter, the algorithm can optimize performance on machines with different communication costs. The main contribution of the paper is the proposal of the 3D - CAQR - EG algorithm, which is a method of extending the Elmroth - Gustavson recursive algorithm to a distributed - memory environment and uses communication - efficient subroutines. This algorithm is particularly effective when dealing with tall - skinny matrices and can significantly reduce the communication bandwidth cost, although this will lead to an increase in latency cost.