Performance-Driven Task and Data Co-scheduling Algorithms for Data-Intensive Applications in Grid Computing

Changqin Huang,Deren Chen,Yao Zheng,Hualiang Hu
DOI: https://doi.org/10.1007/978-3-540-24655-8_36
2004-01-01
Abstract:To gain higher performance under many constraints, effective scheduling is a key concern in data-intensive grid computing. Based on a Dual-Component and Dual-Queue Distributed Schedule Model (DCDQDSM), we present task and data co-scheduling algorithms, by which the waiting time to access datasets for the scheduled task will reduce. Firstly data replication and elimination schedule are processed by an independent approach. Secondly, if a task is divisible, the task and its dataset are divided into subtasks and their necessary data subsets. Task scheduling adopts a general approach. Finally, when a scheduled task/subtask doesn't hit its dataset, associated data transferring is bound to this task. On the basis of relation between task execution and data access, data replication and computing may proceed concurrently in one scheduled task with divisible dataset or between scheduled tasks. Corresponding theoretic analysis and experimental results suggest that the scheduling algorithms improve execution performance and resource utilization.
What problem does this paper attempt to address?