A Novel Time Computation Model Based on Algorithm Complexity for Data Intensive Scientific Workflow Design and Scheduling

Jing He,Yanchun Zhang,Guangyan Huang,Chaoyi Pang
DOI: https://doi.org/10.1002/cpe.1445
2009-01-01
Concurrency and Computation Practice and Experience
Abstract:Scientific workflow offers a framework for cooperation between remote and shared resources on a grid computing environment (GCE) for scientific discovery. One major function of scientific workflow is to schedule a collection of computational subtasks in well-defined orders for efficient outputs by estimating task duration at runtime. In this paper, we propose a novel time computation model based on algorithm complexity (termed as TCMAC model) for high-level data intensive scientific workflow design. The proposed model schedules the subtasks based on their durations and the complexities of participant algorithms. Characterized by utilization of task duration computation function for time efficiency, the TCMAC model has three features for a full-aspect scientific workflow including both dataflow and control-flow: (1) provides flexible and reusable task duration functions in GCE; (2) facilitates better parallelism in iteration structures for providing more precise task durations; and (3) accommodates dynamic task durations for rescheduling in selective structures of control flow. We will also present theories and examples in scientific workflows to show the efficiency of the TCMAC model, especially for control-flow. Copyright © 2009 John Wiley & Sons, Ltd.
What problem does this paper attempt to address?