A Greedy Algorithm for Optimally Pipelining a Reduction

Bradley R. Lowery,Julien Langou
DOI: https://doi.org/10.48550/arXiv.1310.4645
2013-10-17
Distributed, Parallel, and Cluster Computing
Abstract:Collective communications are ubiquitous in parallel applications. We present two new algorithms for performing a reduction. The operation associated with our reduction needs to be associative and commutative. The two algorithms are developed under two different communication models (unidirectional and bidirectional). Both algorithms use a greedy scheduling scheme. For a unidirectional, fully connected network, we prove that our greedy algorithm is optimal when some realistic assumptions are respected. Previous algorithms fit the same assumptions and are only appropriate for some given configurations. Our algorithm is optimal for all configurations. We note that there are some configuration where our greedy algorithm significantly outperform any existing algorithms. This result represents a contribution to the state-of-the art. For a bidirectional, fully connected network, we present a different greedy algorithm. We verify by experimental simulations that our algorithm matches the time complexity of an optimal broadcast (with addition of the computation). Beside reversing an optimal broadcast algorithm, the greedy algorithm is the first known reduction algorithm to experimentally attain this time complexity. Simulations show that this greedy algorithm performs well in practice, outperforming any state-of-the-art reduction algorithms. Positive experiments on a parallel distributed machine are also presented.
What problem does this paper attempt to address?