Scheduling Jobs Across Geo-Distributed Datacenters with Max-Min Fairness.
Li Chen,Shuhao Liu,Baochun Li,Bo Li
DOI: https://doi.org/10.1109/tnse.2018.2795580
IF: 6.6
2018-01-01
IEEE Transactions on Network Science and Engineering
Abstract:It has become routine for large volumes of data to be generated, stored, and processed across geographically distributed datacenters. To run a single data analytic job on such geo-distributed data, recent research proposed to distribute its tasks across datacenters, considering both data locality and network bandwidth across datacenters. Yet, it remains an open problem in the more general case, where multiple analytic jobs need to fairly share the resources at these geo-distributed datacenters. In this paper, we focus on the problem of assigning tasks belonging to multiple jobs across datacenters, with the specific objective of achieving max-min fairness across jobs sharing these datacenters, in terms of their job completion times. We formulate this problem as a lexicographical minimization problem, which is challenging to solve in practice due to its inherent multi-objective and discrete nature. To address these challenges, we iteratively solve its single-objective subproblems, which can be transformed to equivalent linear programming (LP) problems to be efficiently solved, thanks to their favorable properties. As a highlight of this paper, we have designed and implemented our proposed solution as a fair job scheduler based on Apache Spark, a modern data processing framework. With extensive evaluations of our real-world implementation on Amazon EC2, we have shown convincing evidence that max-min fairness has been achieved using our new job scheduler.