EdgeCross: Cloud Scale Traffic Management at Peering Edges
Xiaoliang Wang,Penghui Mi,Yong Zhu,Baoyi An,Yinhua Wang,Lixiang Wang,Xuezhi Yu,Qiong Xie,Xiang Huang,Mingliang Yin,Chaoyang Ji,Wei Sun,Yihang Lv,Yuhang Chen,Cam-Tu Nguyen,Chen Tian,Xiaoming Fu
DOI: https://doi.org/10.1145/3696396
2024-01-01
Abstract:Cloud providers deployed dozens of PoPs and data centers globally to serve billions of geo-distributed users. The traffic management at peering edges has become a key capability of cloud network operators to meet the diverse demands of users. With the rapid growth of cloud applications, users have recently announced new performance requirements, e.g., achieving latency as low as possible instead of maintaining a specified delay. The conventional inter-domain bandwidth allocation approach, which aims to reduce the high operating expenditures of bandwidth usage, fails to meet these new requirements. We further reveal that the flow scheduling among PoPs may fail due to the limited link capacity hidden by the cloud private backbone network controller. Therefore we seek a new traffic management at peering edges. We propose a new controller framework, EdgeCross, that satisfies not only users' emerging demands but maintains low operating costs. The large number of fine-grain application-aware flows and the consideration of backbone links' capacity lead to very high complexity of routing computation and verification for the controller. EdgeCross introduces a two-phase operation that first achieves the low-expense bandwidth allocation according to the standard 95th percentile billing model and then allocates specified flows to peering edges based on users' requirements. EdgeCross further reduces large memory consumption by proposing an effective routing table compression approach. The evaluation based on a production network with 16 PoPs has shown that EdgeCross can successfully process the routes of 1 billion flows in 10 seconds, reduce the average delay for performance-sensitive flows by 2 milliseconds compared to traditional BGP, and is able to save the bandwidth cost by 10-26% compared to the state-of-the-art Cascara.