$\Mathrm {F^{2}}$ Tree: Rapid Failure Recovery for Routing in Production Data Center Networks

Guo Chen,Youjian Zhao,Hailiang Xu,Dan Pei,Dan Li
DOI: https://doi.org/10.1109/tnet.2017.2672678
2017-01-01
IEEE/ACM Transactions on Networking
Abstract:Failures are not uncommon in production data center networks (DCNs) nowadays. It takes long time for the DCN routing to recover from a failure and find new forwarding paths, significantly impacting realtime and interactive applications at the upper layer. In this paper, we present a fault-tolerant DCN solution, called F(2)Tree, which is readily deployed in existing DNCs. F(2)Tree can significantly improve the failure recovery time only through a small amount of link rewiring and switch configuration changes. Through testbed and emulation experiments, we show that F(2)Tree can greatly reduce the routing recovery time after failure (by 78%) and improve the performance of upper layer applications when routing failure happens (96% less deadline-missing requests).
What problem does this paper attempt to address?