Meeting Coflow Deadlines in Data Center Networks with Policy-Based Selective Completion.
Shouxi Luo,Pingzhi Fan,Huanlai Xing,Hongfang Yu
DOI: https://doi.org/10.1109/tnet.2022.3187821
2023-01-01
IEEE/ACM Transactions on Networking
Abstract:Recently, the abstraction of coflow is introduced to capture the collective data transmission patterns among modern distributed data-parallel applications. During processing, coflows generally act as barriers; accordingly, time-sensitive applications prefer their coflows to complete within deadlines, and deadline-aware coflow scheduling becomes very crucial. Regarding these data-parallel applications, we notice that many of them, including large-scale query systems , distributed iterative training , and erasure codes enabled storage , are able to tolerate loss-bounded incomplete inputs by design. This tolerance indeed brings a flexible design space for the schedule of their coflows: when getting overloaded, the network can trade coflow completeness for the timeliness, and balance the completeness of different coflows on demand. Unfortunately, existing coflow schedulers neglect this tolerance, resulting in inflexible and inefficient bandwidth allocations. In this paper, we explore this fundamental trade-off and design POCO, a POlicy-based COflow scheduler, along with a transport layer enhancement scheme, to achieve customizable selective coflow completion for emerging time-sensitive distributed applications. Internally, POCO employs a suite of novel designs along with admission controls to make flexible , work-conserving , and performance-guaranteed rate allocation to online coflow requests very efficiently. Extensive trace-based simulations indicate that POCO is highly flexible and achieves optimal coflow schedules respecting the requirements specified by applications.