期刊文献+

一种无文件恢复的检查点算法

A checkpointing algorithm without file restoring
下载PDF
导出
摘要 在容错计算中,检查点技术的使用可以使进程在失效后恢复到一个最近状态,从而有效控制计算损失。考虑进程在计算过程中可能对一些外部资源进行操作,如对文件进行更新,进程恢复时若不进行外部资源恢复,那么进程所见的外部资源状态可能与实际不一致。若允许外部资源恢复将会带来可观的已获取信息丢失,同样是不希望的。为此我们提出了一个扩展的AFS文件语义,在该语义下检查点设置依赖于文件状态,但在进程恢复过程中无需进行文件卷回。这一方面减少了存储信息的丢失,另一方面可提供快速的恢复。 In the fanh-tolerant computing, processes can recover from failures by using the checkpointing technique. In the course of recovery, if the statuses of processes are related to the statuses of files, processes may be hard to achieve a consistent state without file restoring. If file restoring is allowed, there may exist considerable data loss. To address the problem, an extended AFS file semantics is proposed in this paper. In this semantics, file status changing involves checkpointing. As a result, file restoring can be avoided, and fast recovery can be achieved.
出处 《高技术通讯》 EI CAS CSCD 北大核心 2010年第9期924-928,共5页 Chinese High Technology Letters
基金 863计划(2008AA01A204 2009AA01A404)资助项目
关键词 容错 分布式文件系统 检查点 文件语义 fault-tolerance, distributed file system, checkpoint, file semantics
  • 相关文献

参考文献11

  • 1Elnozahy E N,Alvisi L,Wang Y M,et al.A survey of rollback-recovery protocols in message-passing systems.ACM Comput Surv,2002,34(3):375-408.
  • 2Elnozahy E N,Johnson D B,Zwaenepoel W.The performance of consistent checkpointing.In:Proceedings of the 11th Symposium on Reliable Distributed Systems,Houston,Texas,USA,1992.39-47.
  • 3Koo R,Toueg S.Checkpointing and rollback-recovery for distributed systems.IEEE Trans Software Engineering,1987,SE-13(1):23-31.
  • 4Sakata T C,Garcia I C.Non-blocking synchronous checkpointing based on rollback-dependency trackability.In:Proceedings of the IEEE Symposium on Reliable Distributed Systems,Leeds,United Kingdom,2006.411-420.
  • 5Helary J M,Mostefaoui A,Netzer R H B,et al.Communication-based prevention of useless checkpoints in distributed computations.Distributed Computing,2000,13(1):29-43.
  • 6Tsai J.On properties of RDT communication-induced checkpointing protocols.IEEE Trans Parallel and Distributed Systems,2003,14(8):755-764.
  • 7Gupta B,Rahimi S,Yang Y.A novel roll-back mechanism for performance enhancement of asynchronous checkpointing and recovery.Informatica,2007,31(1):1-13.
  • 8Levy E,Silberschatz A.Distributed file systems:concepts and examples.Comput Surv,1990,22(4):321-374.
  • 9Schlichting R D,Schneider F B.Fail-stop processors:an approach to designing fault-tolerant computing systems.ACM Trans Computer Systems,1983,1(3):222-238.
  • 10Satyanarayanan M,Kistler J J,et.al.Coda:a highly available file system for a distributed workstation environment.IEEE Trans Comput,1990,39(4):447-459.

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部