Efficient Data and Task Co-Scheduling for Scientific Workflow in Geo-Distributed Datacenters

Jian Chen,Jinghui Zhang,Aibo Song
DOI: https://doi.org/10.1109/CBD.2017.19
2017-01-01
Abstract:Scientific workflow usually needs to be performed in multiple collaborative datacenters for the requirement of accessing community-wide resources. However, the movements of initial input data and intermediate data across geo-distributed datacenters would hinder efficient execution of large-scale dataintensive scientific workflows. In this paper, a novel scheduling approach based on graph partition is proposed for the execution of data-intensive scientific workflow in geo-distributed datacenters, aiming at the optimization of the overall data transfer cost. Simulations show that our algorithm significantly reduces the overall geo-distributed data transfer and demonstrate its effectiveness.
What problem does this paper attempt to address?