A Robust Communication Framework for Parallel Execution on Volunteer PC Grids.

Eshwar Rohit,Hien Nguyen,Nagarajan Kanna,Jaspal Subhlok,Edgar Gabriel,Qian Wang,Margaret S. Cheung,David Anderson
DOI: https://doi.org/10.1109/ccgrid.2011.72
2011-01-01
Abstract:Volunteer PC grids represent massive computation capacity at a low cost, but are challenging to employ for parallel computing because of variable and unpredictable performance and availability. A communicating parallel program must employ explicit redundancy, or implicit redundancy with uncoordinated checkpoint-restart to make continuous forward progress in such an unreliable environment. A communication model based on one-sided Put/Get calls to an abstract global shared space is a good match as processes can execute their communication operations independently and asynchronously. However, no existing system is designed for redundant communicating processes. The key problem is that a single logical operation that impacts the global program state may be executed by different instances of the same process at different times leading to semantic inconsistency. This paper presents the design, execution model, implementation, and usage of {\em Volpex}, a communication layer for robust execution on volunteer PC grids. The research leads to a practical way to employ idle PCs for latency tolerant parallel computing applications.
What problem does this paper attempt to address?