Flow-level Adaptive Routing Scheme for RDMA Enabled Dragonfly Network

Hang Wang,Ming Zhang,Peilin Hong
DOI: https://doi.org/10.1109/globecom46510.2021.9685963
2021-01-01
Abstract:To minimize the number of expensive global links, Dragonfly topology is developed greatly in today's data centers. However, deploying Remote Direct Memory Access (RDMA) applications inside Dragonfly requires the network to provide a routing scheme running at the flow level to avoid packet disorder. The existing Dragonfly routing scheme uses queue length to estimate link load, which is not a reasonable criterion for flow-level routing. In this paper, we use the amount of remaining data of flows to estimate the flow completion time and propose our routing scheme, named Remaining Data-based Adaptive Load-balance (RDAL). We compare the performance of RDAL with another routing scheme at the flow level. Our simulation shows that for flow-level routing, RDAL provides improvements in both average flow completion time and saturation throughput, especially in the adversarial traffic pattern. At most, RDAL can increase saturation throughput by 12% and reduce average flow completion time by 34% than UGAL, the state-of-the-art routing scheme for Dragonfly,
What problem does this paper attempt to address?