期刊文献+

基于Linux内核的进程检查点系统设计与实现 被引量:5

Design and Implementation of Process Checkpointing System Based on Linux Kernel
下载PDF
导出
摘要 作为一种流行的软件容错机制,检查点与恢复技术的实现模式有两种:用户级和系统级。首先阐述了两者的区别,然后根据Linux可加载内核模块机制提出了一种基于Linux内核的进程检查点与恢复实现方法。利用Linux内核线程实现了检查点与恢复内核模块,并基于此内核模块在用户层构造了一检查点函数库,为用户提供了相应接口。用户通过组合使用这些接口可以高效地实现具体检查点与恢复算法。 As a popular software fault-tolerant mechanism, checkpoint and recovery technique can be implemented by two modes:user-level and system-level. First, the differences between the two modes were discussed. Then according to the Linux LKM (Loadable Kernel Module) mechanism,a method was proposed to design a process checkpoint and recovery system based on the Linux kernel. Checkpoint and recovery kernel module was implemented using the Linux ker- nel thread. Based on this kernel module,a checkpoint library was constructed in the user-level to provide corresponding interfaces for users. By using some selected interfaces, the particular checkpoint and recovery algorithm can he implemented effectively.
出处 《计算机科学》 CSCD 北大核心 2009年第4期192-194,214,共4页 Computer Science
基金 国家自然科学基金(60873138)资助
关键词 检查点与恢复 用户级 系统级 内核模块 内核线程 Checkpoint and recovery,User level,System level,Kernel module,Kernel thread
  • 相关文献

参考文献13

  • 1Jose C S, Petrini F, Davis K, et al. Current Practice and a Direction Forward in Checkpoint/Restart Implementation for Fault Tolerance//Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium, 2005. IPDPS' 05. April 2005 : 19
  • 2Elnozahy M, Alvisi L, Wang Y M. A Survey of Rollback-Recovery Protocols in Message-Passing Systems. ACM Computing Surveys, 2002,34 (3) : 375-408
  • 3汪东升,沈美明,郑纬民,裴丹.一种基于检查点的卷回恢复与进程迁移系统[J].软件学报,1999,10(1):68-73. 被引量:16
  • 4魏晓辉,鞠九滨.分布式系统中的检查点算法[J].计算机学报,1998,21(4):367-375. 被引量:12
  • 5Luls M S, Joao G S. System - level versus User- Defined Checkpointing//Proeeedings Seventeenth IEEE Symposium on Reliable Distributed Systems, 1998. ISRDS'98. October 1998:68
  • 6Sancho J C,Petrini F,Johnson G,et al. On the Feasibility of Incremental Checkpointing for Scientific Computing//Proceedings of the 18^th International Parallel & Distributed Processing Symposium, 2004. IPDPS'04. April 2005 : 58
  • 7Meyer N. User and Kernel Level Checkpointing//Proceeclings of the Sun Microsystems HPC Consortium Meeting, 2003. April 2003 : 15
  • 8Tannenbaum T, Litzkow M.The Condor distributed processing system. Dr. Dobb' s Journal, 1995,25 (2) : 40-48
  • 9Plank J S, Micah B, Gerry K, et al. Libckpt :Transparent checkpointing under unix//Usenix Winter Technical Conference. NewOrleans, Louisiana, USA, 1995
  • 10Sankaran S, Jeffrey M S, Barrett B, et al. The LAM/MPI Checkpoint/Restart Framework: System-Initiated Checkpointing// Proceedings of the LACSI Symposium, 2005. LACSI' 05. October 2003:479

二级参考文献1

  • 1鞠九滨,计算机学报,1997年,20卷,10期,873页

共引文献24

同被引文献33

  • 1周恩强,卢宇彤,沈志宇.一个适合大规模集群并行计算的检查点系统[J].计算机研究与发展,2005,42(6):987-992. 被引量:12
  • 2万永波,张根宝,田泽,杨峰.基于ARM的嵌入式系统Bootloader启动流程分析[J].微计算机信息,2005,21(11Z):90-92. 被引量:41
  • 3杨超,张伟哲,张宏莉,田舟贤,方滨兴.基于检查点算法的网格计算容错机制研究[J].微电子学与计算机,2006,23(9):82-84. 被引量:6
  • 4周小成,孙凝晖,霍志刚,马捷.一种降低并行程序检查点开销的方法[J].计算机工程,2007,33(12):84-86. 被引量:3
  • 5CAO G H,MUKESH S.Checkpointing with mutable checkpoints[J].Theoretical Computer Science,2003,290:1127-1148.
  • 6FOSIER I,KESSELMAN C.网格计算[M].北京:机械工业出版社,2005.
  • 7RONALD J.Leach,Setting checkpoints in legacy code to improve fault-tolerance[J].The Journal of Systems and Software,2008,81:920-928.
  • 8HIMADRI S P,AROBINDA G.Finding a suitable checkpoint and recovery protocol for a distributed application[J].Journal of Parallel and Distributed Computing,2006,66:732-749.
  • 9Demsky B,Rinard M.Automatic Detection and Repair of Errors in Data Structures[C]//Proc.of the 18th Annual ACM SIGPLAN Conference on Object-oriented Programming,Systems,languages,and Applications.Anaheim,California,USA:[s.n.],2003.
  • 10Paul H H,Jason C D.Berkeley Lab Checkpoint/restart(BLCR) for Linux Clusters[J].Journal of Physics,2006,46(3):494-499.

引证文献5

二级引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部