期刊文献+

一种基于检查点的并行程序调试器的设计与实现 被引量:6

THE DESIGN AND IMPLEMENTATION OF A DEBUGGER FOR PARALLEL PROGRAMS BASED ON CHECKPOINT
下载PDF
导出
摘要 为支持大规模长时间运行并行程序的调试 ,有必要将检查点机制引入到并行程序调试器中 .检查点设置与卷回应用中需要解决中途消息、孤儿消息和多米诺效应、活锁 4个问题 ;并行程序调试中需要解决不确定性问题 .提出的基于状态冻结的确定性检查点设置方法 ,可以避免检查点应用中孤儿消息和多米诺效应、活锁 3个问题 ,通过消息记录的方法处理中途消息问题 ;采用记录 /重放方法解决并行调试中的不确定性问题 .基于状态冻结的确定性检查点设置方法 ,有效地解决了并行程序调试器和检查点结合时产生的诸多问题 .该方法具有结构清晰、易于实现的优点 .基于此技术 ,设计并实现了一个并行调试工具—— DENNET. In order to support the debugging of large scale parallel programs that run for a long time, it is necessary to introduce checkpointing techniques into debuggers for parallel programs. In the application of checkpoint, there are four problems,such as transient messages, nephew messages,domino effect, and live lock. Also, non deterministic must be solved in debuggers for parallel programs. The deterministic checkpointing technique based on state freezing could avoid three of the four problems arisen in checkpoint. The problem of transient messages is solved by using the technique of message recording. The non deterministic problem of parallel debugging is solved by record/replay. The technique proposed can effectively solve all the problems when combining parallel debugging and checkpointing. The primary benefit of the technique is simple, clear and easy to implement. In order to apply the technique, a debugger, called DENNET is implemented, which can debug parallel programs in rollback mode.
出处 《计算机研究与发展》 EI CSCD 北大核心 2002年第12期1580-1586,共7页 Journal of Computer Research and Development
基金 国家自然科学基金资助 (6993 3 0 2 0 )
关键词 检查点 并行程序调试器 设计 消息传递 message passing, debugging for parallel programs, state freezing, checkpointing
  • 相关文献

参考文献4

二级参考文献8

  • 1Birman K,Joseph T A.Reliable Communication in the Presence of Failure.ACM Transactions on Computer Systems,1987,5:47~76
  • 2Soneoka T,Ibaraki T.Logically Instantaneous Message Passing in Asynchronous Distributed Systems.IEEE Transactions on Computer,1994,43:513 ~ 527
  • 3Liu Jian,Yu Hong-liang.Implementation of a Debugger for Parallel Program in Cluster System.The 7th Joint Intl.Computer Conference,2000.11,Shantou:Guangdong
  • 4Xiong Jianxin,et al.On-line Debugging of Parallel Program.Intl.Conf.on Parallel Algorithm( ICP A' 95).Wuh an Oct.1995
  • 5Shatz S M.Communication Mechanism for Programming Distributed Systems.Computer,1984,17:21 ~ 28
  • 6Lamport L.Time,Clock and Ordering of Events in a Distributed System.Communication of ACM,1978,21 (7)
  • 7熊建新,王鼎兴.UNIX系统源级调试器设计[J].小型微型计算机系统,1997,18(1):30-36. 被引量:2
  • 8王鼎兴,郑纬民,沈美明.并行机群的若干关键技术[J].清华大学学报(自然科学版),1998,38(S1):18-25. 被引量:11

共引文献9

同被引文献57

引证文献6

二级引证文献10

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部