Design And Analysis Of A New Distributed Scheduling Approach

Wenzhuo Li,Chuang Lin,Chao Xue
DOI: https://doi.org/10.1109/ISCC.2017.8024679
2017-01-01
Abstract:Big data analytics frameworks are developing towards larger degrees of parallelism and shorter task durations to achieve lower latency. Consequently, millions of scheduling decisions need to be made per second, which has posed a big challenge to today's centralized schedulers. Therefore, many researchers and enterprises turn to distributed scheduling approaches to avoid the throughput limitation of centralized designs. To our knowledge, Omega, Apollo and Sparrow are three famous approaches that make prior moves in distributed scheduling but they each have shortcomings and none of them try peer-to-peer architecture. We then propose a new scheduling approach called Piper that adapts peer-to-peer idea to the domain of distributed scheduling, which provides near-optimal performance. We have implemented Piper using Apache Thrift and the results show that Piper reduces job response times by over 1.5x when compared to Sparrow (we select Sparrow for comparison because it is a leading design and has been open source). In addition, trace-driven simulations have been used to evaluate Piper when scaling to large clusters, which further reveals that Piper provides better performance than Sparrow.
What problem does this paper attempt to address?