DMTCP: bringing interactive checkpoint–restart to Python

Kapil Arya,Gene Cooperman
DOI: https://doi.org/10.1088/1749-4699/8/1/014005
2015-07-17
Abstract:DMTCP (Distributed MultiThreaded CheckPointing) is a mature checkpoint–restart package. It operates in user space without kernel privilege, and adapts to application-specific requirements through plugins. While DMTCP has been able to checkpoint Python and IPython ‘from the outside’ for many years, a Python module has recently been created to support DMTCP. IPython support is included through a new DMTCP plugin. A checkpoint can be requested interactively within a Python session or under the control of a specific Python program. Further, the Python program can execute specific Python code prior to checkpoint, upon resuming (within the original process) and upon restarting (from a checkpoint image). Applications of DMTCP are demonstrated for: (i) Python-based graphics using virtual network client, (ii) a fast/slow technique to use multiple hosts or cores to check one (Cython Behnel S et al 2011 Comput. Sci. Eng. 13 31–39) computation in parallel, and (iii) a reversible debugger, FReD, with a novel reverse-expression watchpoint feature for locating the cause of a bug.
What problem does this paper attempt to address?