期刊文献+

一种低费用的协调检查点算法

A Low Cost Cooperative Checkpoint Algorithm
下载PDF
导出
摘要 检查点算法作为一种有效的故障技术及容错手段,已广泛地运用在网格、分布式和云计算系统中。该文提出了一种非阻塞协调检查点算法,该算法增加了系统的可靠性,并允许检查点灵活设置,充分缩减了同步信息数量,加速了检查点形成时间。和典型的相关算法比较,该文提出的算法使用更少的同步控制消息,具有更低的费用,引入同步控制消息的时间复杂度由一般的O(n2)降到O(n),且同步消息数仅仅为n-1。 The technology of checkpoint as an effective method of fault tolerance has been widely used in grid,distributed and cloud systems.In this paper,a non-blocking cooperative checkpoint algorithm,which increases the reliability of the system and set up checkpoints flexible.At the same time,it fully reduces the synchronization information quantity,speeds up the formation checkpoint time,fully reduced the amount of information synchronized,the checkpoint accelerated development time.When compared to noted recent algorithms,the proposed algorithm uses less synchronous control messages with lower overhead.While the time complexity of control message during synchronous phase is reduced from O(n2) to O(n),the algorithm's controlling messages are reduced to n-1.
作者 党红恩 赵尔平 雒伟群 DANG Hong-en, ZHAO Er-ping, LUO Wei-qun (School of Information Engineering,Tibet Nationalities Institute, Xianyang 712082,China)
出处 《电脑知识与技术》 2014年第4期2394-2396,共3页 Computer Knowledge and Technology
基金 国家民委科研项目(12XZZ002) 西藏自治区自然基金项目(12KJZRYMY07)
关键词 检查点 分布式系统 云计算系统 容错 checkpoint distributed system cloud computing systems fault-tolerant
  • 相关文献

参考文献9

  • 1CHANDY K M,Leslie L.Distributed Snapshots: Determining Global States of Distributed Systems[J].ACM Transactions on Computer Systems, 1985,3(1): 63-75.
  • 2Lin Yi-Bing.Per-User Checkpointing for Mobility Database Failure Restoration[J].IEEE TRANSACTIONS ON MOBILE COMPUTING, 2005,4(1):1-6.
  • 3Janakiraman G,Yuval Tamir.Coordinated Checkpointing-Rollhack Error Recovery for Distributed Shared Memory Multicomputers[C]. Proceedings of the 13tb Symposium on Reliable Distributed Systems Dana Point, 1994: 42-51.
  • 4Bidyut G,Shahram R,Ziping Liu.Design of High Performance Distributed Snapshot Recovery Algorithms for Ring Networks[J].Journal of Computing and Information Technology, 2008,16(1):23-28.
  • 5Mani C K,Leslie L.Distributed Snapshots: Determining Global States of Distributed Systems[J]. ACM Transactions on Computer Sys- tems,1985,3(1): 63-75.
  • 6Monnet S,Morin C,Badrinath R.Hybrid checkpointing for parallel applications in cluster federations[C].CCGRID2004. Chicago:4th IEEE International Symposium on Cluster Computing and the Grid,2004: 773-782.
  • 7G Cao,M Singhal.Mutable checkpoints: a new eheckpointing approach for mobile computing systems [J].IEEE Transactions on Parallel and Distributed Systems,2001,12(2):157- 172.
  • 8汪东升,邵明珑.具有O(n)消息复杂度的协调检查点设置算法[J].软件学报,2003,14(1):43-48. 被引量:17
  • 9刘国良,陈蜀宇,徐光侠,常光辉.基于动态分组的两级检查点算法[J].华南理工大学学报(自然科学版),2011,39(2):141-147. 被引量:1

二级参考文献17

  • 1Monnet S, Morin C, Badrinath R. Hybrid checkpointing for parallel applications in cluster federations [C] //Proceedings of IEEE International Symposium on Cluster Computing and the Grid. Washington D C : IEEE, 2004 : 773-782.
  • 2Gupta B, Rahimi S, Ahmad R. A new roll-forward checkpointing/recovery mechanism for cluster federation [ J ].International Journal of Computer Science and Network Security,2006,6( 11 ) :292-298.
  • 3Gupta B, Rahimi Shahram, Yang Yixin. A novel roll-back mechanism for performance enhancement of asynchronous checkpointing and recovery [ J ]. Informatica: Slovenia, 2007,31(1) :1-13.
  • 4Elnozahy E N, Alvisi Lorenzo, Wang Yi-min, et al. A survey of rollback-recovery protocols in message-passing systems [ J ]. ACM Computing Surveys, 2002,34 ( 3 ) : 375-408.
  • 5Bowen N S, Pradhan D K. Processorand memory-based checkpoint and rollback recovery [ J ]. Computer, 1993,26 (2) :22-31.
  • 6Bosilca George, Delmas Remi, Dongarra Jack, et al. Algorithm-based fault tolerance applied to high performance computing [J]. Journal of Parallel Distributed Computer, 2009,69(4) :410-416.
  • 7Smith Jim, Watson Paul. Applying low-overhead rollbackrecovery to wide area distributed query processing [ R ]. Newcastle: School of Computing Science, University of Newcastle upon Tyne,2004.
  • 8Gupta Sunil K, Chauhan R K, Kumar Parveen. Backward error recovery protocols in distributed mobile systems: a survey [ J ]. Journal of Theoretical and Applied Information Technology,2008,30(4) :225-240.
  • 9Rusu Claudia, Grecu Cristian, Anghel Lorena. Blocking and non-blocking checkpointing and rollback recovery for networks-on-chip [C] //Proceedings of the 2nd Workshop on Dependable and Secure Nanocomputing. Anchorage : IEEE, 2008:1-6.
  • 10Manivannan D. Checkpointing and rollback recovery in distributed systems:existing solutions, open issues and proposed solutions [ C ] // Proceedings of the 12th WSEAS International Conference on Systems. Heraklion: ACM ,2008:22-24.

共引文献16

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部