摘要
扩充的面向图结构的分布式程序设计模型 (extended graph- oriented model,简称 Ex GOM)提供了一个支持动态配置的系统框架 .系统的动态配置包括系统运行时的伸缩、运行时的升级以及出现故障后的重配置 .故障后的重配置所涉及的问题之一是如何恢复系统原状态 ,该文着重就此问题进行了讨论 ,给出了基于故障敏感图的异步检查点回卷算法和故障恢复策略 .该算法和策略考虑了在暂时性主机故障中单个主机上有多个故障进程的情况 .与其他异步回卷及故障恢复算法相比 ,该算法将故障区域局部化 ,仅对故障敏感节点进行回卷 ,从而有效地降低了系统开销 .
Extended graph oriented distributed programming model (ExGOM) provides a system architecture to support dynamic configuration. Dynamic configuration involves system expansion and shrink during execution, upgrading while running, and reconfiguration after a fault occurs. One problem in reconfiguration is how to recover the system to the consistent states that exist just before the occurrence of faults. This paper is focused on this problem and proposes an asynchronous rollback algorithm and a crash recovery mechanism based on fault sensitive graphs. The issue of multiple faulty processes on a single transient faulty host is addressed. Compared with other asynchronous rollback and recovery algorithms, the algorithm presented in this paper localizes the region of faults. Only fault sensitive nodes are rolled back. This results in a minimized system overhead.
出处
《软件学报》
EI
CSCD
北大核心
2000年第2期235-239,共5页
Journal of Software
基金
国家 8 6 3高科技项目基金! (No.86 3- 30 6 - ZT0 2 - 0 3- 0 1)
香港理工大学研究基金
关键词
分布式程序设计
检查点
回卷
故障恢复
Distributed programming, checkpoint, rollback, crash recovery.