An Improved Bound for Minimizing the Total Weighted Completion Time of Coflows in Datacenters

Mehrnoosh Shafiee,Javad Ghaderi
DOI: https://doi.org/10.48550/arXiv.1704.08357
2017-04-27
Abstract:In data-parallel computing frameworks, intermediate parallel data is often produced at various stages which needs to be transferred among servers in the datacenter network (e.g. the shuffle phase in MapReduce). A stage often cannot start or be completed unless all the required data pieces from the preceding stage are received. \emph{Coflow} is a recently proposed networking abstraction to capture such communication patterns. We consider the problem of efficiently scheduling coflows with release dates in a shared datacenter network so as to minimize the total weighted completion time of coflows. Several heuristics have been proposed recently to address this problem, as well as a few polynomial-time approximation algorithms with provable performance guarantees. Our main result in this paper is a polynomial-time deterministic algorithm that improves the prior known results. Specifically, we propose a deterministic algorithm with approximation ratio of $5$, which improves the prior best known ratio of $12$. For the special case when all coflows are released at time zero, our deterministic algorithm obtains approximation ratio of $4$ which improves the prior best known ratio of $8$. The key ingredient of our approach is an improved linear program formulation for sorting the coflows followed by a simple list scheduling policy. Extensive simulation results, using both synthetic and real traffic traces, are presented that verify the performance of our algorithm and show improvement over the prior approaches.
Data Structures and Algorithms,Discrete Mathematics
What problem does this paper attempt to address?