Torp: Full-Coverage and Low-Overhead Profiling of Host-Side Latency

Xiang Chen,Hongyan Liu,Junyi Guo,Xinyue Jiang,Qun Huang,Dong Zhang,Chunming Wu,Haifeng Zhou
DOI: https://doi.org/10.1109/infocom48880.2022.9796758
2022-01-01
Abstract:In data center networks (DCNs), host-side packet processing accounts for a large portion of the end-to-end latency of TCP flows. Thus, the profiling of host-side latency anomalies has been considered as a crucial part in DCN performance diagnosis and troubleshooting. In particular, such profiling requires full coverage (i.e., profiling every TCP packet handled by end-hosts) and low overhead (i.e., profiling should avoid high CPU consumption in end-hosts). However, existing solutions fully rely on end-hosts to implement host-side latency profiling, leading to low coverage or high overhead. In this paper, we propose Torp, a framework that offers full-coverage and low-overhead profiling of host-side latency. Our key idea is to offload profiling operations to top-of-rack (ToR) switches, which inherently offer full coverage and line-rate packet processing performance. Specifically, Torp selectively offloads profiling operations to the ToR switch based on switch limitations. It efficiently coordinates the ToR switch and end-hosts to execute the entire latency profiling task. We have implemented Torp on 32×100Gbps Tofino switches. Testbed experiments indicate that Torp achieves full coverage and orders of magnitude lower host-side overhead compared to other solutions.
What problem does this paper attempt to address?