Fault-Tolerant Adaptive Routing in Dragonfly Networks

Dong Xiang,Bing Li,Yi Fu
DOI: https://doi.org/10.1109/TDSC.2017.2693372
2019-01-01
IEEE Transactions on Dependable and Secure Computing
Abstract:Dragonfly networks have been widely used in the current high-performance computers or high-end servers. Fault-tolerant routing in dragonfly networks is essential. The rich interconnects provide good fault-tolerance ability for the network. A new deadlock-free adaptive fault-tolerant routing algorithm based on a new two-layer safety information model, is proposed by mapping routers in a group, and groups of the dragonfly network into two separate hypercubes. The new fault-tolerant routing algorithm tolerates static and dynamic faults. Our method can determine whether a packet can reach the destination at the source by using the new safety information model, which avoids dead-ends and aimless misrouting. Sufficient simulation results show that the proposed fault-tolerant routing algorithm even outperforms the previous minimal routing algorithm in fault-free networks in many cases.
What problem does this paper attempt to address?