Geryon: Accelerating Distributed CNN Training by Network-Level Flow Scheduling

Shuai Wang,Dan Li,Jinkun Geng
DOI: https://doi.org/10.1109/infocom41043.2020.9155282
2020-07-01
Abstract:Increasingly rich data sets and complicated models make distributed machine learning more and more important. However, the cost of extensive and frequent parameter synchronizations can easily diminish the benefits of distributed training across multiple machines. In this paper, we present Geryon, a network-level flow scheduling scheme to accelerate distributed Convolutional Neural Network (CNN) training. Geryon leverages multiple flows with different priorities to transfer parameters of different urgency levels, which can naturally coordinate multiple parameter servers and prioritize the urgent parameter transfers in the entire network fabric. Geryon requires no modification in CNN models and does not affect the training accuracy. Based on the experimental results of four representative CNN models on a testbed of 8 GPU servers, Geryon achieves up to 95.7% scaling efficiency even with 10GbE bandwidth. In contrast, for most models, the scaling efficiency of vanilla TensorFlow is no more than 37% and that of TensorFlow with parameter partition and slicing is around 80%. In terms of training throughput, Geryon enhanced with parameter partition and slicing achieves up to 4.37x speedup, where the flow scheduling algorithm itself achieves up to 1.2x speedup over parameter partition and slicing.
What problem does this paper attempt to address?