Taming TCP Incast Throughput Collapse in Data Center Networks

Jiao Zhang,Fengyuan Ren,Li Tang,Chuang Lin
DOI: https://doi.org/10.1109/icnp.2013.6733609
2013-01-01
Abstract:The TCP incast problem attracts a lot of attention due to its wide existence in cloud services and catastrophic performance degradation. Some effort has been made to solve it. However, the industry is still struggling with it, such as Facebook. Based on the investigation that the TCP incast problem is mainly caused by the TimeOuts (TOs) occurring at the boundary of the stripe units, this paper presents a simple and effective TCP enhanced mechanism, called GIP (Guarantee Important Packets), for the applications with the TCP incast problem. The main idea is making TCP aware of the boundaries of the stripe units, and reducing the congestion window of each flow at the start of each stripe unit as well as redundantly transmitting the last packet of each stripe unit. GIP modifies TCP a little at the end hosts, thus it can be easily implemented. Also, it poses no impact on the other TCP-based applications. The results of both experiments on our testbed and simulations on the ns-2 platform demonstrate that TCP with GIP can avoid almost all of the TOs and achieve high goodput for applications with the incast communication pattern.
What problem does this paper attempt to address?