CEFS: compute-efficient flow scheduling for iterative synchronous applications

Shuai Wang,Dan Li,Jiansong Zhang,Wei Lin
DOI: https://doi.org/10.1145/3386367.3431307
2020-01-01
Abstract:Iterative Synchronous Applications (ISApps) are popular in today's data centers, represented by distributed deep learning (DL) training. In ISApps, multiple nodes carry out the computing task iteratively, with globally synchronizing the results in each iteration. To increase the scaling efficiency of ISApps, in this paper we propose a new flow scheduling approach, called CEFS. CEFS saves the waiting time of computing nodes from two aspects. For a single node, flows with data which can trigger earlier computation at the node are assigned with higher priority; among nodes, flows towards slower nodes are assigned with higher priority. To address the challenges of realizing CEFS in real systems, e.g., the limited number of priority queues on commodity switches, the combination of two types of priorities, and the adaption to different applications and hardware environments, we design an online Bayesian optimization based priority assignment algorithm which meets a two-dimension order-preserving rule. We implement a CEFS prototype and evaluate CEFS through both a 16-node GPU/RoCEv2 testbed by training typical DL models and NS-3 simulations. Compared with TensorFlow and two representative scheduling solutions: TicTac and ByteScheduler, CEFS improves the training throughput by up to 253%, 252% and 47%, respectively. Besides, the scaling efficiency of the 16-node system under TensorFlow, TicTac, ByteScheduler and CEFS is 26.6%~46.9%, 26.7%~47.0%, 63.9%~80.3%, and 92.9%~94.7%, respectively. The NS-3 simulation results show that CEFS can even achieve similar scaling efficiency at a larger scale.
What problem does this paper attempt to address?