Implementing data aware scheduling in Gfarm R using LSF™ scheduler plugin mechanism

Wei E I Xiaohui,W. L I Wilfred,Osamu Tatebe, XUGaochao,Hu Liang,Ju U. Jiubin,Denis A. Nicole
2005-01-01
Abstract:In high energy physics, astronomy, space exploration, genomics and other disciplines, applications that both access and generate large data sets, called data intensive jobs, increasingly draw our attention. The Data Grids, like Gfarm, seek to harness geographically distributed resources for such large-scale data-intensive problems. However, scheduling is a challenging task in this context. In this paper, we discuss the design and implementation of data aware scheduling and data management system in Gfarm. The system is able to find data-affinity hosts for Gfarm jobs and to adjust the distribution of the data replicas dynamically according to the job load. Using the LSF scheduler plugin mechanism, we do not need to write a new scheduler from scratch or make a lot of changes to an existing scheduler. Moreover, the new policy provided can cooperate with other scheduling policies in the system. We describe our experiences with the UNICORE Grid environment. Several lessons of general applicability can be drawn in regard to user uptake and security. The principal lesson is that more effort should be taken to be made to meet the needs of the target user community of the middleware development. Novel workflow strategies, in particular, should not be imposed on an existing community.
What problem does this paper attempt to address?