Round-Robin Synchronization: Mitigating Communication Bottlenecks in Parameter Servers.

Chen,Wei Wang,Bo Li
DOI: https://doi.org/10.1109/infocom.2019.8737587
2019-01-01
Abstract:Deep learning is usually performed in GPU clusters where each worker machine iteratively refines the model parameters by communicating the update with the Parameter Server (PS). More often than not, workers communicate in a synchronous manner, so as to avoid using out-of-dated parameters and make high-quality refinement in each iteration. However, as all workers synchronize with the PS simultaneously, the communication becomes a severe bottleneck. To address this problem, in this paper we propose the Round-Robin Synchronous Parallel (R2SP) scheme, which coordinates workers to make updates in an evenly-gapped, round-robin manner. This way, R2SP can minimize the network contention at a minimum cost of the refinement quality. We further extend R2SP to heterogeneous clusters by adaptively tuning the batch size of each worker based on its processing capability. We have implemented R2SP as a ready-to-use python library for status-quo deep learning frameworks. EC2 deployment in GPU clusters show that R2SP effectively mitigates the communication bottlenecks, accelerating the training of popular image classification models by up to 25%.
What problem does this paper attempt to address?