Dependable Virtualized Fabric on Programmable Data Plane
Kaihui Gao,Shuai Wang,Kun Qian,Dan Li,Rui Miao,Bo Li,Yu Zhou,Ennan Zhai,Chen Sun,Jiaqi Gao,Dai Zhang,Binzhang Fu,Frank Kelly,Dennis Cai,Hongqiang Harry Liu,Yan Li,Hongwei Yang,Tao Sun
DOI: https://doi.org/10.1109/tnet.2022.3224617
2023-01-01
Abstract:In modern multi-tenant data centers, each tenant desires reassuring dependability from the virtualized network fabric – bandwidth guarantee with work conservation, bounded tail latency and resilient reachability. However, the slow convergence of prior works under network dynamics and uncertainties can hardly provide the dependability for tenants. Further, state-of-the-art load balance schemes are guarantee-agnostic and bring great risks on breaking bandwidth guarantee, which is overlooked in prior works. In this paper, we propose vFab, a dependable virtualized fabric framework which can (1) quickly detect network failure in data plane, (2) explicitly select proper paths for all flows, and (3) converge to ideal bandwidth allocation at sub-millisecond. The core idea of vFab is to leverage the programmable data plane to build a fusion of an active edge (e.g., NIC) and an informative core (e.g., switch), where the core sends link status and tenant information to the edge via telemetry to help the latter make a timely and accurate decision on path selection and traffic admission. We fully implement vFab with commodity SmartNICs and programmable switches. Extensive evaluations show that vFab can keep bandwidth guarantee with high bandwidth utilization, low and bounded latency, and resilient reachability under various network scenarios with limited overhead. Application-level experiments show that vFab can improve QPS by $2.4\times $ and cut tail latency by $10\times $ compared to the alternatives.