DDGrid: A Grid Computing Environment with Massive Concurrency and Fault-Tolerance Support
Yongjian Wang,Zhongzhi Luan,Depei Qian,Yuanqiang Huang,Ting Chen,Biao Han,Yinan Ren,Kunqian Yu,Hualiang Jiang
DOI: https://doi.org/10.1109/GCC.2008.27
2008-01-01
Abstract:Grid Computing is an effective computing paradigm widely used in solving complex problems. There are a variety of existing grid middleware systems which support operation of grid infrastructures, including CNGrid GOS, EGEE gLite, Globus Toolkit, and OSG Condor etc. These grid infrastructures focus on encapsulating underlying computing and storage resources and providing necessary basic services such as batch job service, information service, scheduling service, and cross-domain security, etc. Some other features such as fault-tolerance, massive concurrency support are vital to the success of real applications, especially complex and long running applications. These features have not been the focus point of the current grid systems. DDGrid, a key project supported by CNGrid (China National Grid), is aiming at establishing a grid computing environment that can utilize computing resources scattered over the Internet to carry out virtual-screening operations which requires computing power that a single institute or company can't afford. In our design and implementation of DDGrid, we propose a master/worker mode which effectively utilizes computing resources that the underlying grid infrastructure provides and tries to provide additional features of fault-tolerance and massive concurrency support that are essential to the real applications.