一种低费用的协调检查点算法

A Low Cost Cooperative Checkpoint Algorithm

下载PDF

导出

摘要检查点算法作为一种有效的故障技术及容错手段,已广泛地运用在网格、分布式和云计算系统中。该文提出了一种非阻塞协调检查点算法,该算法增加了系统的可靠性,并允许检查点灵活设置,充分缩减了同步信息数量,加速了检查点形成时间。和典型的相关算法比较,该文提出的算法使用更少的同步控制消息,具有更低的费用,引入同步控制消息的时间复杂度由一般的O（n2）降到O（n）,且同步消息数仅仅为n-1。 The technology of checkpoint as an effective method of fault tolerance has been widely used in grid,distributed and cloud systems.In this paper,a non-blocking cooperative checkpoint algorithm,which increases the reliability of the system and set up checkpoints flexible.At the same time,it fully reduces the synchronization information quantity,speeds up the formation checkpoint time,fully reduced the amount of information synchronized,the checkpoint accelerated development time.When compared to noted recent algorithms,the proposed algorithm uses less synchronous control messages with lower overhead.While the time complexity of control message during synchronous phase is reduced from O（n2） to O（n）,the algorithm＇s controlling messages are reduced to n-1.

作者党红恩赵尔平雒伟群 DANG Hong-en, ZHAO Er-ping, LUO Wei-qun （School of Information Engineering,Tibet Nationalities Institute, Xianyang 712082,China）

机构地区西藏民族学院信息工程学院

出处《电脑知识与技术》 2014年第4期2394-2396,共3页 Computer Knowledge and Technology

基金国家民委科研项目（12XZZ002）西藏自治区自然基金项目（12KJZRYMY07）

关键词检查点分布式系统云计算系统容错 checkpoint distributed system cloud computing systems fault-tolerant

分类号 TP302 [自动化与计算机技术—计算机系统结构]

引文网络
相关文献

参考文献9

1CHANDY K M,Leslie L.Distributed Snapshots: Determining Global States of Distributed Systems[J].ACM Transactions on Computer Systems, 1985,3(1): 63-75.
2Lin Yi-Bing.Per-User Checkpointing for Mobility Database Failure Restoration[J].IEEE TRANSACTIONS ON MOBILE COMPUTING, 2005,4(1):1-6.
3Janakiraman G,Yuval Tamir.Coordinated Checkpointing-Rollhack Error Recovery for Distributed Shared Memory Multicomputers[C]. Proceedings of the 13tb Symposium on Reliable Distributed Systems Dana Point, 1994: 42-51.
4Bidyut G,Shahram R,Ziping Liu.Design of High Performance Distributed Snapshot Recovery Algorithms for Ring Networks[J].Journal of Computing and Information Technology, 2008,16(1):23-28.
5Mani C K,Leslie L.Distributed Snapshots: Determining Global States of Distributed Systems[J]. ACM Transactions on Computer Sys- tems,1985,3(1): 63-75.
6Monnet S,Morin C,Badrinath R.Hybrid checkpointing for parallel applications in cluster federations[C].CCGRID2004. Chicago:4th IEEE International Symposium on Cluster Computing and the Grid,2004: 773-782.
7G Cao,M Singhal.Mutable checkpoints: a new eheckpointing approach for mobile computing systems [J].IEEE Transactions on Parallel and Distributed Systems,2001,12(2):157- 172.
8汪东升,邵明珑.具有O(n)消息复杂度的协调检查点设置算法[J].软件学报,2003,14(1):43-48. 被引量：17
9刘国良,陈蜀宇,徐光侠,常光辉.基于动态分组的两级检查点算法[J].华南理工大学学报（自然科学版）,2011,39(2):141-147. 被引量：1

二级参考文献17

1Monnet S, Morin C, Badrinath R. Hybrid checkpointing for parallel applications in cluster federations [C] //Proceedings of IEEE International Symposium on Cluster Computing and the Grid. Washington D C : IEEE, 2004 : 773-782.
2Gupta B, Rahimi S, Ahmad R. A new roll-forward checkpointing/recovery mechanism for cluster federation [ J ].International Journal of Computer Science and Network Security,2006,6( 11 ) :292-298.
3Gupta B, Rahimi Shahram, Yang Yixin. A novel roll-back mechanism for performance enhancement of asynchronous checkpointing and recovery [ J ]. Informatica: Slovenia, 2007,31(1) :1-13.
4Elnozahy E N, Alvisi Lorenzo, Wang Yi-min, et al. A survey of rollback-recovery protocols in message-passing systems [ J ]. ACM Computing Surveys, 2002,34 ( 3 ) : 375-408.
5Bowen N S, Pradhan D K. Processorand memory-based checkpoint and rollback recovery [ J ]. Computer, 1993,26 (2) :22-31.
6Bosilca George, Delmas Remi, Dongarra Jack, et al. Algorithm-based fault tolerance applied to high performance computing [J]. Journal of Parallel Distributed Computer, 2009,69(4) :410-416.
7Smith Jim, Watson Paul. Applying low-overhead rollbackrecovery to wide area distributed query processing [ R ]. Newcastle: School of Computing Science, University of Newcastle upon Tyne,2004.
8Gupta Sunil K, Chauhan R K, Kumar Parveen. Backward error recovery protocols in distributed mobile systems: a survey [ J ]. Journal of Theoretical and Applied Information Technology,2008,30(4) :225-240.
9Rusu Claudia, Grecu Cristian, Anghel Lorena. Blocking and non-blocking checkpointing and rollback recovery for networks-on-chip [C] //Proceedings of the 2nd Workshop on Dependable and Secure Nanocomputing. Anchorage : IEEE, 2008:1-6.
10Manivannan D. Checkpointing and rollback recovery in distributed systems:existing solutions, open issues and proposed solutions [ C ] // Proceedings of the 12th WSEAS International Conference on Systems. Heraklion: ACM ,2008:22-24.

共引文献16

1李国徽,王洪亚,陈基雄,刘云生.支持分布式合作实时事务处理的协同检验点方法[J].计算机学报,2004,27(9):1207-1212. 被引量：2
2李国徽,陈基雄,王洪亚,刘云生.支持移动合作实时事务的一种新的协同检验点算法[J].小型微型计算机系统,2004,25(11):1943-1947.
3李国徽,王洪亚,刘云生.一种高效的合作实时事务并行检验点算法[J].计算机科学,2005,32(7):69-71.
4张宇,张玉芳.基于PVM的准同步检查点设置方法[J].计算机工程与设计,2006,27(3):494-496.
5霍志刚,马捷,孙凝晖.一个基于通信系统支持的并行检查点系统[J].计算机工程,2007,33(5):217-219. 被引量：1
6王勇,王忠群,刘涛,吴小兰.支持构件迁移的分布式系统容错算法[J].计算机工程与设计,2007,28(15):3566-3568. 被引量：1
7张杰智,任国林.一种基于信道不可靠环境的协调式检查点协议[J].计算机技术与发展,2008,18(2):55-58. 被引量：4
8万国伟,卢宇彤,谢旻,沈志宇.一种低开销非阻塞的协同式检查点算法[J].计算机工程,2007,33(24):66-68. 被引量：1
9张至柔.网格计算服务系统检查点算法研究[J].计算机工程与设计,2008,29(14):3596-3599. 被引量：1
10门朝光,徐振朋,李香.移动计算系统检查点迁移策略的性能评价[J].哈尔滨工业大学学报,2010,42(5):806-810. 被引量：3

1朱诗生,张惠珍.人机交互软件界面设计[J].信息技术,2009,33(5):36-39. 被引量：12
2赵安全.按形成时间删除文件[J].软件世界,1996(6):32-33.
3魏俊涛.计算机网络故障技术在远程教育中的应用[J].知识经济,2013(8):167-167.
4杨晓龙,吴秋峰,张佐,王尚武.智能代理技术及其在售后服务中的应用[J].计算机工程与应用,2002,38(18):245-247. 被引量：3
5闻新,周露.神经网络故障诊断技术的可实现性[J].导弹与航天运载技术,2000(2):17-22. 被引量：24
6陈文钦,王英.高速公路机电系统故障智能诊断研究[J].北方交通,2010(7):71-73. 被引量：2
7姚智强.浅析电气设备状态监测与故障诊断技术[J].民营科技,2013(8):3-3. 被引量：6
8李俊,李帆,夏水斌,申建.用电信息采集终端故障诊断技术研究[J].仪表技术,2013(12):36-39. 被引量：5
9邵磊,李杰,胡赢.闪存存储器预防与修复编程读取扰动故障专利技术发展综述[J].企业技术开发（下旬刊）,2015,34(7):69-71.
10郭子江,都跃天,马缚钢.性能优越的高速局域网──FDDI[J].微机发展,1996,6(2):53-55.

电脑知识与技术

2014年第4期

浏览历史

内容加载中请稍等...

一种低费用的协调检查点算法

参考文献9

二级参考文献17

共引文献16

相关作者

相关机构

相关主题

浏览历史