FC+: Near-optimal Deadlock-free Expander Data Center Networks

Xiao Zhang,Peirui Cao,Yongxi Lyu,Qizhou Zhang,Shizhen Zhao,Xinbing Wang,Chenghu Zhou
DOI: https://doi.org/10.1109/ispa-bdcloud-socialcom-sustaincom59178.2023.00033
2023-01-01
Abstract:Expander networks have gained attention as a cost-efficient alternative to expensive Clos networks in data centers. However, they face challenges with deadlocks caused by the widespread deployment of PFC-enabled RoCE networks. Unfortunately, current methods to address deadlocks in expander networks often come with drawbacks that either compromise performance or fail to completely eliminate deadlocks. After identifying path diversity as the performance bottleneck in FC (Flatten Clos), we present FC+ (Flatten Clos Plus), a topology-routing co-design to eliminate deadlocks and achieve near-optimal performance. Similar to FC, FC+ also maps its topology to a multi-layered virtual topology and performs up-down routing to eliminate deadlocks. Based on this, FC+ introduces 2 new designs that can effectively improve path diversity. First, FC+ adopts a non-uniform virtual multi-layer design, which greatly increases the number of deadlock-free paths. Second, FC+ uses deadlock-free K-Shortest Paths (DFKSP) for routing, utilizing the path diversity better. We perform throughput evaluation under different traffic patterns. With 1 lossless priority, FC+ consistently outperforms FC and the performance enhancement reaches 1.4x to 2x under near-worst case. Another advantage of FC+ over FC is that FC+’s DF-KSP routing allows using more than 1 lossless priorities to further improve performance. Compared to Tagger, the state-of-the-art lossless priority management method to avoid deadlocks, FC+ reduces the number of lossless priorities from 3 to 2 in order to achieve near-optimal performance.
What problem does this paper attempt to address?