Django: Bilateral coflow scheduling with predictive concurrent connections

Jiaqi Zheng,Liulan Qin,Kexin Liu,Bingchuan Tian,Chen Tian,Bo Li,Guihai Chen
DOI: https://doi.org/10.1016/j.jpdc.2021.01.006
IF: 4.542
2021-06-01
Journal of Parallel and Distributed Computing
Abstract:<p>For data-parallel frameworks, their communication is highly structured. Coflow is a networking abstraction proposed for their <em>all-or-nothing</em> job-specific semantics. Minimizing coflow completion time (CCT) decreases the completion time of corresponding jobs. However, state-of-the-art coflow scheduling approaches suffer from several drawbacks. On the one hand, both sender-driven and receiver-driven scheduling approaches fail to achieve optimal especially when the bandwidth bottleneck exists. On the other hand, they fail to optimize the number of concurrent connections since the CCT can be prolonged due to too many or too few concurrent connections.</p><p>In this paper, we propose Django, a bilateral coflow scheduling framework. We first use Support Vector Machine (SVM) as the machine learning model to automatically identify the optimal number of concurrent connections, <em>i.e.</em>, the queue limitation in the switch. Based on the predicted results, we further develop a set of distributed coflow scheduling algorithms in a scalable manner. Testbed experiments and trace-driven simulations show that  Django can estimate the number of concurrent connections with an accuracy of 98%, reduce the average CCT and 95th percentile CCT by 15% and 40%, respectively.</p>
computer science, theory & methods
What problem does this paper attempt to address?