Job-Aware Scheduling for Big Data Processing

Zhigang Wang,Yanming Shen
DOI: https://doi.org/10.1109/ccbd.2015.14
2015-01-01
Abstract:Most big data jobs are network-bound, which involve large amount of data transfers among the nodes in a cluster. Optimizing the scheduling of flows can improve big data job performance. Traditional techniques are mostly flow-based scheduling, without considering the flow correlations. In this paper, we take the dependency of the flows into account and propose traffic forecasting and job-aware priority scheduling for big data processing. First, we forecast the network traffic for flows of the same job through run-time monitoring, and assign a unique priority for each job and tag every packet in this job. Then we schedule flows of the same priority (often the same job) in a FIFO order. We implement our proposed scheme using NS-2 simulator and show that our system can increase the network utilization and reduce the job completion time.
What problem does this paper attempt to address?