Increasing Resilience of SD-WAN by Distributing the Control Plane

Friedrich Altheide,Simon Buttgereit,Michael Rossberg,Guenter Schaefer
DOI: https://doi.org/10.1109/nof58724.2023.10302816
2024-07-19
IEEE Transactions on Network and Service Management
Abstract:Modern WAN interconnects utilize SD-WAN to automatically respond to network changes and improve link utilization, latency, and availability. Therefore, they incorporate controllers with a centralized view, which collect network state from managed gateways, calculate suitable forwarding actions, and distribute them accordingly. However, this limits the robustness and availability of the network control plane, especially in the event of node or partial network outages. In this paper, we propose a distributed and highly robust SD-WAN control plane without any central or regional controller. Our solution can handle arbitrary device failures as well as network partitioning. The distributed forwarding decisions are based on user-defined, dynamically evaluated path cost functions, and consider not only path quality but also quality fluctuations. The evaluation shows that our approach can handle several thousand SD-WAN gateways and hundreds of network policies in terms of computation. Further, the communication overhead introduced due to its distributed architecture is discussed and shown to be negligible compared to a central approach. This paper is an extended version of our work published in 2023. It introduces novel insights, including an in-depth analysis of the information transmitted between sites, a new strategy for policy deployment, a discussion as well as a detailed analysis of approaches that reduce communication bandwidth, and the introduction of a method for grouping multiple flows without the need for explicit coordination.
computer science, information systems
What problem does this paper attempt to address?