A Parallel Debugger Based on Cluster Operating System

鄢超,刘淘英,陈国良
2004-01-01
Journal of Computer Research and Development
Abstract:The design of a parallel debugger is indispensable and yet still challenging in developing tools for parallel environments This paper focuses on the design and implementation of an actual parallel debugger, DCDB3 0 (Dawning Cluster DeBugger), which has been realized on Dawning 3000 clusters as a part of the cluster operating system to be used on Dawning 4000 DCDB3 0 is of a typical client/server structure A friendly user interface is provided, which visualizes the tedious process of debugging The user interfaces, as clients, can be distributed far away from the server with the aid of DRPC (Dawning remote procedure call), which provides communications between the client end and the server end, and with the aid of the task management module, which makes it easy for the client end to execute programs on the server machine Both DRPC and task management module, like DCDB3 0, are parts of the cluster operating system The server end of DCDB3 0 deals with debugging processes, receiving debugging commands and sending results The scalability of DCDB3 0 is emphasized, which means that advanced parallel debugging techniques can be added Replay based on recording wildcard message senders are implemented and DSM debugging and other techniques are going to be realized Compared with the former versions, DCDB3 0 is more powerful and convenient to users
What problem does this paper attempt to address?