How Useful is Communication Scheduling for Distributed Training?

Yihao Zhao,Xuanzhe Liu,Xin Jin
DOI: https://doi.org/10.1109/monetec60984.2024.10768125
2024-01-01
Abstract:Recently, there is a resurgence of packet scheduling ideas in the form of communication scheduling in the application layer for distributed training. Given recent results on potentially huge improvements, it is critical to properly interpret these results and understand how far we can go.We take a first-principles approach to analyzing and understanding the role of communication scheduling in distributed training. We formulate a mathematical model to represent the computation and communication pattern, and prove that the upper bound of improvements with communication scheduling is 3× for widely-used distributed training architectures.More importantly, we establish a quantitative relationship between the benefit of communication scheduling and the computation-to-communication ratio. While the exact curve for each model varies, we demonstrate that all models have the same shape—concave. Surprisingly, contrary to the common belief, for varying models and hardware configurations, we find that communication scheduling can offer only limited improvements in addition to overlapping. Our results raise the question about the necessity of overloading parameter transmission with application-layer semantics. Additionally, we provide both theoretical analysis and empirical studies to show that most improvements can be obtained with well-understood network-layer methods without having to obtain the application-layer knowledge.
What problem does this paper attempt to address?