期刊文献+

一种高可用集群的系统管理技术 被引量:2

System Management Techniques for a Highly-available Cluster System
下载PDF
导出
摘要 在并行和分布式计算环境中,随着系统规模的增长,系统出错的概率大大增加。为提高集群系统的可靠性和可用性,采用对称式Active/Active高可用模型的原理和组通信工具,实现了一种增强头节点作业服务可用性的高可用管理方案;针对并行计算环境的特点,利用LAM/Migration检查点迁移技术,实现了集群系统中计算节点的故障自探测、任务自恢复功能。 In parallel and distributed computing environment,with the growth of system scale,the probability of happening system errors increases greatly.For the purpose of improving the reliability and availability of cluster system,using principle of symmetric Active/Active high availability model and group communication facility,it achieves a high availability management schema for enhancing availability of job service on head nodes.And an implementation method of high availability management module for computing nodes which has considered the characteristic of parallel computing environment and taken advantage of LAM/Migration checkpoint migration technology is given.It makes computing nodes in the cluster system have functions of fault self-detection and task self-recovery.
作者 单忠伟
出处 《舰船电子工程》 2010年第3期23-26,共4页 Ship Electronic Engineering
关键词 并行计算 高可用 检查点 进程迁移 parallel computing high availability checkpoint process migration
  • 相关文献

参考文献6

二级参考文献20

  • 1Towards cluster survivability. Chokchai Leangsuksun ( 1 ) , A - nand Tikotekar(1), Stephan L. Scott (2), Makan Pourzandi(3), and Ibrahim Haddad ( 4 ). Louisiana Tech University ( 1 ), Oak Ridge National Laboratory(2), Open Systems Lab, Ericsson Research Canada(3),Open Source Development Labs(4)
  • 2Stellner G. CoCheck: Checkpointing and Process Migration for MPI//Proceedings of the 10th International Parallel Processing .Symposium. 1996 : 526-531
  • 3Agbaria A M, Friedman R. Starfish: Fault-Tolerant Dynamic MPI Programs on Clusters of Workstations//Proceedings of the 8th IEEE International Symposium on High Performance Distributed Computing. 1999,31
  • 4MPICH-V Introduction. http://www.lri. fr/-gk/MPICH-V
  • 5Process Migration for MPI Applications based on Coordinated Checkpoint. Jiannong Cao ( 1 ), Yinghao Li ( 1 ), Minyi Guo (2). Department of Computing The Hong Kong Polytechnic University Kowloon, Hung Horn Hong Kong, China PR ( 1 ), Department of Computer Software The University of Aizu Aizu-Waka- matsu City, Fukushima 965-8580,Japan(2)
  • 6A Job Pause Service under LAM/MPI+BLCR for Transparent Fault Tolerance. Chao Wang (I), Frank Mueller (1), Christian Engelmann(2), Stephen L. Scott(2). North Carolina State University (1) ,Oak Ridge National Laboratory (2)
  • 7Elnozahy E N, Johnson D B, Zwaenepoel W. The performance of consistent checkpointing.//Proceedings of the llth Symposium on Reliable Distributed Systems. Oct. 1992:39-47
  • 8Butler R M,Lusk E L.Monitors, Messages and Clusters:The P4 Parallel Programming System. Parallel Computing,1994,20(4):547-564
  • 9The MPI Forum. The MPI Message-passing Interface Standard.http://www.mcs.anl.gov/mpi/standard .html, 1995-05
  • 10Stellner G. CoCheck:Checkpointing and Process Migration for MPI. In 10th Intl. Par. Proc. Symp.,1996-04

共引文献10

同被引文献9

引证文献2

二级引证文献5

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部