摘要
为了用检查点设置及回卷恢复技术提高并行分布式系统容错性能时降低设置检查点的时间和空间开销,提出了一种非阻塞协调检查点算法。与传统的两阶段提交算法不同,该算法是单阶段提交算法,可跳过临时检查点阶段直接获得永久检查点,减少了同步控制消息的数量,加快了检查点的形成时间。它通过发送进程排除孤儿消息,实现了并行计算;通过设置检查点算法启动周期,解决中途消息问题。该算法的时间复杂度由通常的O(n^2)降低到O(n).只需要n-1个同步消息。
In order to reduce the time and space overheads for setting checkpoints when using the checkpomtmg-ronDacx recovery technique to enhance the error-tolerance performance of parallel and distributed systems, a non-blocking cooperative checkpointing algorithm is proposed. It is a one-phase method. Different from the conventional (two- phase) approach, it jumps over the temporary checkpoint phase to obtain permanent checkpoints to reduce the number of controlled messages and the time of checkpoint forming. The proposed checkpointing algorithm allows processes to take permanent checkpoints directly, without taking temporary checkpoints. The character of the algo- rithm contributes to its speed of execution. Orphan messages are eliminated by sender processes and in-transit mes- sages are eliminated by the checkpointing interval and retransmission mechanism. Its time complexity for obtaining checkpoints is O (n) , not the ordinary O (n^2).
出处
《高技术通讯》
CAS
CSCD
北大核心
2012年第12期1243-1249,共7页
Chinese High Technology Letters
基金
重庆市自然科学基金计划(CSTC2008BB2307)资助项目.
关键词
容错
非阻塞检查点
回卷恢复
单阶段提交算法
fault tolerance, non-blocking checkpointing, rollback recovery, single phase commit algorithm