VTrace: Automatic Diagnostic System for Persistent Packet Loss in Cloud-Scale Overlay Network

Chongrong Fang,Haoyu Liu,Mao Miao,Jie Ye,Lei Wang,Wansheng Zhang,Daxiang Kang,Biao Lyv,Peng Cheng,Jiming Chen
DOI: https://doi.org/10.1145/3387514.3405851
2020-01-01
Abstract:Persistent packet loss in the cloud-scale overlay network severely compromises tenant experiences. Cloud providers are keen to automatically and quickly determine the root cause of such problems. However, existing work is either designed for the physical network or insufficient to present the concrete reason of packet loss. In this paper, we propose to record and analyze the on-site forwarding condition of packets during packet-level tracing. The cloud-scale overlay network presents great challenges to achieve this goal with its high network complexity, multi-tenant nature, and diversity of root causes. To address these challenges, we present VTrace, an automatic diagnostic system for persistent packet loss over the cloud-scale overlay network. Utilizing the "fast path-slow path" structure of virtual forwarding devices (VFDs), e.g., vSwitches, VTrace installs several "coloring, matching and logging" rules in VFDs to selectively track the packets of interest and inspect them in depth. The detailed forwarding situation at each hop is logged and then assembled to perform analysis with an efficient path reconstruction scheme. Experiments are conducted to demonstrate VTrace's low overhead and quick responsiveness. We share experiences of how VTrace efficiently resolves persistent packet loss issues after deploying it in Alibaba Cloud for over 20 months.
What problem does this paper attempt to address?