A Survey of Methods for Collective Communication Optimization and Tuning

Udayanga Wickramasinghe,Andrew Lumsdaine
DOI: https://doi.org/10.48550/arXiv.1611.06334
2016-11-19
Abstract:New developments in HPC technology in terms of increasing computing power on multi/many core processors, high-bandwidth memory/IO subsystems and communication interconnects, pose a direct impact on software and runtime system development. These advancements have become useful in producing high-performance collective communication interfaces that integrate efficiently on a wide variety of platforms and environments. However, number of optimization options that shows up with each new technology or software framework has resulted in a \emph{combinatorial explosion} in feature space for tuning collective parameters such that finding the optimal set has become a nearly impossible task. Applicability of algorithmic choices available for optimizing collective communication depends largely on the scalability requirement for a particular usecase. This problem can be further exasperated by any requirement to run collective problems at very large scales such as in the case of exascale computing, at which impractical tuning by brute force may require many months of resources. Therefore application of statistical, data mining and artificial Intelligence or more general hybrid learning models seems essential in many collectives parameter optimization problems. We hope to explore current and the cutting edge of collective communication optimization and tuning methods and culminate with possible future directions towards this problem.
Distributed, Parallel, and Cluster Computing
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to optimize collective communication operations to improve performance in the context of the rapid development of high - performance computing (HPC) technology. With the progress of multi - core processors, high - bandwidth memory/input - output subsystems and communication interconnection technologies, the development of software and runtime systems is facing new challenges. These advances have promoted the emergence of high - performance collective communication interfaces that are efficiently integrated in various platforms and environments. However, with the emergence of each new technology or software framework, the number of optimization options has surged, leading to the problem of combinatorial explosion, making it almost an impossible task to find the optimal set of parameters. Especially for large - scale applications, such as exascale computing, this problem is more prominent, because at this scale, impractical tuning by brute - force methods may require months of time resources. Therefore, the paper believes that in many collective parameter optimization problems, the application of statistics, data mining and artificial intelligence or more general hybrid learning models becomes crucial. The paper aims to explore current and cutting - edge collective communication optimization and tuning methods and look forward to possible future development directions to solve the above problems. Specifically, the paper discusses static and dynamic collective tuning methods and their applications in applications using collectives, providing a view of collective optimization from a micro to macro perspective. Ultimately, the paper hopes to propose a practical and unified architecture UMTAC (Unified Multidimensional Tuning Architecture) to solve the collective tuning problem. This architecture aims to combine the best practices of existing methods while avoiding some of the problems discussed in the paper.