Components and interfaces of a process management system for parallel programs

Ralph Butler,William Gropp,Ewing Lusk
DOI: https://doi.org/10.1016/s0167-8191(01)00097-7
IF: 0.983
2001-10-01
Parallel Computing
Abstract:Parallel jobs are different from sequential jobs and require a different type of process management. We present here a process management system for parallel programs such as those written using MPI. A primary goal of the system, which we call MPD (for multipurpose daemon), is to be scalable. By this we mean that startup of interactive parallel jobs comprising thousands of processes is quick, that signals can be quickly delivered to processes, and that stdin, stdout, and stderr are managed intuitively. Our primary target is parallel machines made up of clusters of SMPs, but the system is also useful in more tightly integrated environments. We describe how MPD enables fast startup and convenient runtime management of parallel jobs. We show how close control of stdio can support the easy implementation of a number of convenient system utilities, even a parallel debugger. We describe a simple but general interface that can be used to separate any process manager from a parallel library, which we use to keep MPD separate from MPICH.
computer science, theory & methods
What problem does this paper attempt to address?