DGSS: A Dependability Guided Job Scheduling System for Grid Environment

Yongcai Tao,Hai Jin,Xuanhua Shi
DOI: https://doi.org/10.1007/978-3-540-72584-8_57
2007-01-01
Abstract:Due to the diverse failures and error conditions in grid environments, node unavailability is increasingly becoming severe and poses great challenges to reliable job scheduling in grid environment. Current job management systems mainly exploit fault recovery mechanism to guarantee the completion of jobs, but sacrificing system efficiency. To address the challenges, in this paper, a node TTF (Time To Failure) prediction model and job completion prediction model are designed. Based on these models, the paper proposes a dependability guided job scheduling system, called DGSS, which provides failure avoidance job scheduling. The experimental results validate the improvement in the dependability of job execution and system resources utilization.
What problem does this paper attempt to address?