A Survey of Methods for Collective Communication Optimization and Tuning

Udayanga Wickramasinghe,Andrew Lumsdaine

DOI: https://doi.org/10.48550/arXiv.1611.06334

2016-11-19

Abstract:New developments in HPC technology in terms of increasing computing power on multi/many core processors, high-bandwidth memory/IO subsystems and communication interconnects, pose a direct impact on software and runtime system development. These advancements have become useful in producing high-performance collective communication interfaces that integrate efficiently on a wide variety of platforms and environments. However, number of optimization options that shows up with each new technology or software framework has resulted in a \emph{combinatorial explosion} in feature space for tuning collective parameters such that finding the optimal set has become a nearly impossible task. Applicability of algorithmic choices available for optimizing collective communication depends largely on the scalability requirement for a particular usecase. This problem can be further exasperated by any requirement to run collective problems at very large scales such as in the case of exascale computing, at which impractical tuning by brute force may require many months of resources. Therefore application of statistical, data mining and artificial Intelligence or more general hybrid learning models seems essential in many collectives parameter optimization problems. We hope to explore current and the cutting edge of collective communication optimization and tuning methods and culminate with possible future directions towards this problem.

Distributed, Parallel, and Cluster Computing

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to optimize collective communication operations to improve performance in the context of the rapid development of high - performance computing (HPC) technology. With the progress of multi - core processors, high - bandwidth memory/input - output subsystems and communication interconnection technologies, the development of software and runtime systems is facing new challenges. These advances have promoted the emergence of high - performance collective communication interfaces that are efficiently integrated in various platforms and environments. However, with the emergence of each new technology or software framework, the number of optimization options has surged, leading to the problem of combinatorial explosion, making it almost an impossible task to find the optimal set of parameters. Especially for large - scale applications, such as exascale computing, this problem is more prominent, because at this scale, impractical tuning by brute - force methods may require months of time resources. Therefore, the paper believes that in many collective parameter optimization problems, the application of statistics, data mining and artificial intelligence or more general hybrid learning models becomes crucial. The paper aims to explore current and cutting - edge collective communication optimization and tuning methods and look forward to possible future development directions to solve the above problems. Specifically, the paper discusses static and dynamic collective tuning methods and their applications in applications using collectives, providing a view of collective optimization from a micro to macro perspective. Ultimately, the paper hopes to propose a practical and unified architecture UMTAC (Unified Multidimensional Tuning Architecture) to solve the collective tuning problem. This architecture aims to combine the best practices of existing methods while avoiding some of the problems discussed in the paper.

A Survey of Methods for Collective Communication Optimization and Tuning

Collective Mind: cleaning up the research and experimentation mess in computer engineering using crowdsourcing, big data and machine learning

Efficient All-to-All Collective Communication Schedules for Direct-Connect Topologies

Efficient Direct-Connect Topologies for Collective Communications

Distributed Algorithms for Composite Optimization: Unified Framework and Convergence Analysis

A Survey of Potential MPI Complex Collectives: Large-Scale Mining and Analysis of HPC Applications

A Unified Algorithmic Framework for Distributed Composite Optimization.

Collective Communication Optimization for Solving Linear Algebraic Equations

Using meta-heuristics and machine learning for software optimization of parallel computing systems: a systematic literature review

Rethinking Machine Learning Collective Communication as a Multi-Commodity Flow Problem

"The Whole Is Greater Than the Sum of Its Parts": Optimization in Collaborative Crowdsourcing

High Performance LDA Through Collective Model Communication Optimization

Effective method of collective communication for message passing on cluster

Communication Efficient Parallel Algorithms for Optimization on Manifolds

Optimizing Irregular Communication with Neighborhood Collectives and Locality-Aware Parallelism

Tuning Crowdsourced Human Computation

Collective Mind, Part II: Towards Performance- and Cost-Aware Software Engineering as a Natural Science

Communication optimization strategies for distributed deep neural network training: A survey

A Survey on Distributed Online Optimization and Game

Communication-efficient distributed optimization with adaptability to system heterogeneity

On combining system and machine learning performance tuning for distributed data stream applications