Efficient Data Aggregation Transfers in Data Center Networks

LU Fei-Fei,GUO De-Ke,FANG Xing,XIE Xiang-Hui,LUO Xing-Guo
DOI: https://doi.org/10.11897/sp.j.1016.2016.01750
2016-01-01
Abstract:In data centers,distributed computing systems like MapReduce produce massive amount of traffic across successive processing stages.Such shuffle transfers make east-west network resource become a bottleneck.In many commonly used workloads,data flows from all senders to each receiver are typically highly correlated.Many state-of-the-practice systems thus already apply aggregation functions at the receiver side of a shuffle transfer to reduce the output data size.To lower down the network traffic and efficiently use network bandwidth,we introduce in-network aggregation for associated traffic and parallelize the shuffle and reduce phases.It can significantly reduce consuming the rare east-west network resource,and avoid long latency time produced by the shuffle phase in MapReduce jobs.IRS-based algorithm proposed currently has certain limitations.To solve this problem,we first built a model for incast minimal tree with BCube,a representative server-centric networking structure for future data centers,and propose two approximate incast tree construction methods named MIB-based and MC-based,respectively, solely based on the labels senders and the data center topology.MIB-based method is applied to the case of highly correlative senders.It can build an incast minimal tree by making an endeavor to aggregate the high-level senders to low-level senders.MC-based method is applied to the case of loose associative senders.It can build an incast minimal tree by aggregating nodes as far as possible and increasing the least nodes.Then we combined two methods and further proposed M2-based method for any case.It proved that the method we proposed can meet the demand of building the incast tree on line by calculating the time complexity of the M2-based incast tree building method. At last, we analyzed the adaptability of M2-based to other data center structures,and the principle of in-network aggregation in reducing the job execution time.The small-scale experimental results show that,in the different size of data center,M2-based saves the network traffic by 3% on average compared to IRS-based,and shortens about two-third waiting time of a job in the shuffle and reduce phase compared to the existing method which does not perform the in-network aggregation.In the different size of incast transfer,M2-based saves the network traffic by 19% on average compared to IRS-based,and shortens about three-forth waiting time of a job in the shuffle and reduce phase compared to the existing method.
What problem does this paper attempt to address?