MPICH-G-DM: An Enhanced MPICH-G with Supporting Dynamic Job Migration

Xiaohui Wei,Hongliang Li,Dexiong Li
DOI: https://doi.org/10.1109/ChinaGrid.2009.9
2009-01-01
Abstract:Grid is attracting more and more attentions by its massive computational capacity. Tools like Globus Toolkit and MPICH-G2 have been developed to help scientists to facilitate their researches. As a Grid-enabled implementation of MPI, MPICH-G2 helps developers to port parallel applications to cross-domain environment. Since the current computationally-intensive parallel applications, especially long-running tasks, require high availability as well as high performance computing platform, dynamic job migration in Grid environment has became an essential issue. In this study, we present a dynamic job migration enabled MPICH-G2 version, MPICH-G-DM. We use Virtual Job Model (VJM) to reserve resources for the migrating jobs in advance to improve the efficiency of the system. An Asynchronous Migration Protocol (AMP) is proposed to enable the migrating sub jobs to checkpoint/restart and update their new addresses concurrently without a global synchronization. In order to reduce the communicating overhead of job migration, MPICH-G-DM minimized the number of control messages among domains to O(N). Experiment results show that MPICH-G-DM is effective and reliable.
What problem does this paper attempt to address?