Reliable Multicast in Data Center Networks
Dan Li,Mingwei Xu,Ying Liu,Xia Xie,Yong Cui,Jingyi Wang,Guihai Chen
DOI: https://doi.org/10.1109/TC.2013.91
2014-01-01
Abstract:Multicast benefits data center group communication in both saving network traffic and improving application throughput. Reliable packet delivery is required in data center multicast for data-intensive computations. However, existing reliable multicast solutions for the Internet are not suitable for the data center environment, especially with regard to keeping multicast throughput from degrading upon packet loss, which is norm instead of exception in data centers. We present RDCM, a novel reliable multicast protocol for data center network. The key idea of RDCM is to minimize the impact of packet loss on the multicast throughput, by leveraging the rich link resource in data centers. A multicast-tree-aware backup overlay is explicitly built on group members for peer-to-peer packet repair. The backup overlay is organized in such a way that it causes little individual repair burden, control overhead, as well as overall repair traffic. RDCM also realizes a window-based congestion control to adapt its sending rate to the traffic status in the network. Simulation results in typical data center networks show that RDCM can achieve higher application throughput and less traffic footprint than other representative reliable multicast protocols. We have implemented RDCM as a user-level library on Windows platform. The experiments on our test bed show that RDCM handles packet loss without obvious throughput degradation during high-speed data transmission, gracefully respond to link failure and receiver failure, and causes less than 10% CPU overhead to data center servers.