期刊文献+

一种低开销非阻塞的协同式检查点算法 被引量:1

Coordinated Checkpoint Algorithm of Low-overhead and Non-blocking
下载PDF
导出
摘要 协同式检查点设置及卷回恢复技术是一种简单有效的容错手段,被广泛地运用于并行/分布式系统中。为进一步降低协同式检查点算法的开销,该文给出了一个基于可重建检查点的非阻塞协同式检查点算法。并行程序出错导致卷回恢复发生的概率远小于检查点设置概率,该算法利用这一特性,将检查点设置的部分开销转至卷回恢复阶段,降低了容错的开销,提高了系统的可扩展性。 As an effective method of fault-tolerance, technologies of coordinated checkpoint and rollback recovery are widely used on the parallel or distributed computer systems. In order to reduce the overhead of checkpoint time, this paper proposes a low and non-blocking coordinated checkpoint algorithm based on reconstructed checkpoint. Checkpoint happens much more often than rollback, fractional consumption of checkpoint setting is turned to rollback recovery stage. The algorithm lowers fault-tolerance consumption, and improves system's scalability.
出处 《计算机工程》 CAS CSCD 北大核心 2007年第24期66-68,共3页 Computer Engineering
关键词 检查点 容错 卷回恢复 非阻塞 checkpoint fault-tolerance rollback recovery non-blocking
  • 相关文献

参考文献5

二级参考文献5

  • 1E.N. Elnozahy, D. B. Johnson. A survey of rollback-recovery protocols in message passing systems. School of Computer Science, Carnegie Mellon University, Tech Rep: CMU-CS-96-181, 1996
  • 2Pierre Lemarinier, Aurelien Bouteiller. Improved message logging versus improved coordinated checkpointing for fault tolerant MPI.IEEE Int'l Conf. Cluster Computing (Cluster 2003), Hong Kong, 2003
  • 3Chandy K M, Lamport L. Distributed snapshots: Determining global states of distributed systems. ACM Trans. Computer Systems, 1985, 3(1): 63~75
  • 4谢旻 邢座程.NICHAL通信软件接口设计与实现[J].计算机研究与发展,2002,39:189-203.
  • 5汪东升,沈美明,郑纬民,裴丹.一种基于检查点的卷回恢复与进程迁移系统[J].软件学报,1999,10(1):68-73. 被引量:16

共引文献26

同被引文献4

引证文献1

二级引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部