Threshold-Based Routing-Topology Co-Design for Optical Data Center.
Peirui Cao,Shizhen Zhao,Dai Zhang,Zhuotao Liu,Mingwei Xu,Min Yee Teh,Yunzhuo Liu,Xinbing Wang,Chenghu Zhou
DOI: https://doi.org/10.1109/tnet.2023.3265276
2023-01-01
IEEE/ACM Transactions on Networking
Abstract:Despite the bandwidth scaling limit of electrical switching and the high cost of building Clos data center networks (DCNs), the adoption of optical DCNs is still limited. There are two reasons. First, existing optical DCN designs usually face high deployment complexity. Second, these designs are not full-optical and the performance benefit over the non-blocking Clos DCN is not clear. After exploring the design tradeoffs of the existing optical DCN designs, we propose TROD (Threshold Routing based Optical Datacenter), a low-complexity optical DCN with superior performance than other optical DCNs. There are two novel designs in TROD that contribute to its success. First, TROD performs robust topology optimization based on the recurring traffic patterns and thus does not need to react to every traffic change, which lowers deployment and management complexity. Second, TROD introduces tVLB (threshold-based Valiant Load Balance), which can avoid network congestion as much as possible even under unexpected traffic bursts. We conduct simulation based on both Facebook’s real DCN traces and our synthesized highly bursty DCN traces. TROD reduces flow completion time (FCT) by about 1.15- $2.16\times $ compared to Google’s Jupiter DCN, at least $2\times $ compared to other optical DCN designs, and about 2.4- $3.2\times $ compared to expander graph DCN. Compared with the non-blocking Clos, TROD reduces the hop count of the majority packets by one, and could even outperform the non-blocking Clos with proper bandwidth over-provision at the optical layer. Note that TROD can be built with commercially available hardware and does not require host modifications.