Metaflow: A Better Traffic Abstraction for Distributed Applications.

Yang Shi,Jiawei Fei,Mei Wen,Qun Huang,Nan Wu
DOI: https://doi.org/10.1109/hpcc/smartcity/dss.2019.00159
2019-01-01
Abstract:Distributed applications usually feature a set of correlated flows between two consecutive computation stages. The scheduling of these flows has a crucial influence on job completion time. Coflow improves performance by optimizing the finish time of the entire set of flows. However, the flows and computing tasks in one application have more complex relationships that exceed the coflow's harrier assumption. In this context, scheduling via coflow abstraction may hurt application performance. Accordingly, we propose metaflow, a traffic abstraction derived from the computation graph of the application. Metaflow reveals the detailed flow requirements of the application and makes it easier to reduce the job completion time. Based on metaflow, we devise a scheduling heuristic called MSA. MSA is able to find shorter jobs more accurately for inter-job scheduling, and maximize the overlap between communication and computation for intra-job scheduling. To demonstrate the effectiveness of our work, we have conducted extensive simulations with both synthetic single jobs and production traces containing multiple jobs. The simulation results verify that MSA adapts well to different jobs and can achieve a significant increase in average speed of over 1.98x on a real-life workload.
What problem does this paper attempt to address?