摘要
为支持大规模长时间运行并行程序的调试 ,有必要将检查点机制引入到并行程序调试器中 .检查点设置与卷回应用中需要解决中途消息、孤儿消息和多米诺效应、活锁 4个问题 ;并行程序调试中需要解决不确定性问题 .提出的基于状态冻结的确定性检查点设置方法 ,可以避免检查点应用中孤儿消息和多米诺效应、活锁 3个问题 ,通过消息记录的方法处理中途消息问题 ;采用记录 /重放方法解决并行调试中的不确定性问题 .基于状态冻结的确定性检查点设置方法 ,有效地解决了并行程序调试器和检查点结合时产生的诸多问题 .该方法具有结构清晰、易于实现的优点 .基于此技术 ,设计并实现了一个并行调试工具—— DENNET.
In order to support the debugging of large scale parallel programs that run for a long time, it is necessary to introduce checkpointing techniques into debuggers for parallel programs. In the application of checkpoint, there are four problems,such as transient messages, nephew messages,domino effect, and live lock. Also, non deterministic must be solved in debuggers for parallel programs. The deterministic checkpointing technique based on state freezing could avoid three of the four problems arisen in checkpoint. The problem of transient messages is solved by using the technique of message recording. The non deterministic problem of parallel debugging is solved by record/replay. The technique proposed can effectively solve all the problems when combining parallel debugging and checkpointing. The primary benefit of the technique is simple, clear and easy to implement. In order to apply the technique, a debugger, called DENNET is implemented, which can debug parallel programs in rollback mode.
出处
《计算机研究与发展》
EI
CSCD
北大核心
2002年第12期1580-1586,共7页
Journal of Computer Research and Development
基金
国家自然科学基金资助 (6993 3 0 2 0 )
关键词
检查点
并行程序调试器
设计
消息传递
message passing, debugging for parallel programs, state freezing, checkpointing