P-PFC: Reducing Tail Latency with Predictive PFC in Lossless Data Center Networks
Chen Tian,Bo Li,Liulan Qin,Jiaqi Zheng,Jie Yang,Wei Wang,Guihai Chen,Wanchun Dou
DOI: https://doi.org/10.1109/tpds.2020.2969182
IF: 5.3
2020-06-01
IEEE Transactions on Parallel and Distributed Systems
Abstract:Remote Direct Memory Access(RDMA) technology rapidly changes the landscape of nowadays datacenter applications. Congestion control for RDMA networking is a critical challenge. As an end-to-end layer 3 congestion control mechanism, Datacenter QCN (DCQCN) alleviates the unfairness and head-of-the-line blocking problems of Priority-based Flow Control (PFC). However, a lossless network does not guarantee low latency even with DCQCN enabled. When network congestion happens, switch queues still build-up due to the response latency of end-to-end solutions. In this article, we propose Predictive PFC (P-PFC) to reduce tail latency in RDMA networks. P-PFC monitors the derivative of buffer occupation, predicts the happening of PFC trigger in the future, and proactively triggers PFC pause in advance. The benefit is that buffer usage can be maintained at a low level, hence the tail latency can be controlled. Preliminary evaluation results demonstrate that P-PFC can reduce tail latency by more than half of that in standard PFC in many scenarios, without hurting the throughput and average latency. P-PFC can also protect innocent flows compared with standard PFC according to our experiments. To our best knowledge, this is the first work of using derivative to improve PFC in lossless RDMA networks.
computer science, theory & methods,engineering, electrical & electronic