Micro-burst in Data Centers: Observations, Analysis, and Mitigations
Danfeng Shan,Fengyuan Ren,Peng Cheng,Ran Shu,Chuanxiong Guo
DOI: https://doi.org/10.1109/icnp.2018.00019
2018-01-01
Abstract:Micro-burst traffic is not uncommon in data centers. It can cause packet dropping, which results in serious performance degradation (e.g., Incast problem). However, current solutions that attempt to suppress micro-burst traffic are extrinsic and ad hoc, since they lack the comprehensive and essential understanding of micro-burst's root cause and dynamic behavior. On the other hand, traditional studies focus on traffic burstiness in a single flow, while in data centers micro-burst traffic could occur with highly fan-in communication pattern, and its dynamic behavior is still unclear. To this end, in this paper, we re-examine the microburst traffic in typical data center scenarios. We find that evolution of micro-burst is determined by both TCP's self-clocking mechanism and bottleneck link. Besides, dynamic behaviors of micro-burst under various scenarios can all be described by the slope of queue length evolution. Our observations also implicate that conventional solutions like absorbing and pacing are ineffective to mitigate micro-burst traffic. Instead, senders need to slow down as soon as possible. Inspired by the findings and insights from experimental observations, we propose S-ECN policy, which is an ECN marking policy leveraging the slope of queue length evolution. Transport protocols utilizing S-ECN policy can suppress the sharp queue length increment by over 2×, and reduce the average query completion time by ~12-27%.