Impact of Synchronization Topology on DML Performance: Both Logical Topology and Physical Topology
Shuai Wang,Jinkun Geng,Dan Li
DOI: https://doi.org/10.1109/tnet.2021.3117042
2021-01-01
IEEE/ACM Transactions on Networking
Abstract:To tackle the increasingly larger training data and models, researchers and engineers resort to multiple servers in a data center for distributed machine learning (DML). On one hand, DML enables us to leverage the computation power of multiple servers, which can effectively accelerate those computation-intensive tasks. On the other hand, DML also incurs significant communication cost due to parameter synchronization among these servers. In this paper, we want to explore the impact of synchronization topology, including both logical topology and physical topology, on the DML performance. First, we revisit the existing logical topologies, e.g., parameter server and ring allreduce, for parameter synchronization, and we find that these flat synchronization topologies is inefficient when running a large-scale DML training. Therefore, we propose a hierarchical parameter synchronization topology, called HiPS, which can achieve efficient parameter synchronization even on a large scale. Then, we compare two representative physical network topologies, namely, Fat-Tree and BCube. Based on our analyses, BCube has many advantages over Fat-Tree, e.g., higher bandwidth, better load balance, and lower hardware cost. The simulation results also show that BCube is more friendly to RDMA. Relying on the advantages of HiPS and BCube, the GST of "HiPS+BCube" is 12% similar to 70% lower than other combinations. Moreover, when the cluster size increases from 16 to 1024, the performance of "HiPS+BCube" only drops by 6.5%, while the performance of "Ringd-BCube" drops by 44.6%. Hence, we believe "HiPS+BCube" is the optimal solution to benefit DML in large scale.