期刊文献+

通用网格容错框架研究 被引量:4

A general fault-tolerance framework for grid computing
下载PDF
导出
摘要 针对网格计算可靠性需求,提出一套网格计算容错框架,该容错框架包括两个方面网格错误检测与网格错误处理.本容错框架通过提供一种层次式错误检测方式以及基于策略的通用错误处理方式来保证网格计算的可靠性.错误检测服务按照层次方式组织,最底层是本地错误检测器,它负责收集被检测对象的信息,发送到中间层的数据收集器,中间层数据收集器按照列表方式发送被检测对象的信息到顶层数据收集器.当错误检测器检测到运行错误时,按照决策分析的方法来提供灵活的错误处理方式.对系统的性能评测表明提出的通用网格容错框架具有很好的扩展性、高效性以及较低的额外开销. A general fault-tolerance framework for grid computing is proposed which are dealt with hierarchical structure fault detection services and policy-based fault-handling method, based on the requirements of reliable grid computing. The bottom of the fault detection service is local fault detector, which monitors the objects in local area and sends heartbeat messages to the middle data collector; the middle data collector sends the status list of the monitored objects to the top data collectors within specific interval; the top data collector is managed by an index server. When any fault detected, the system chooses an appropriate fault-handling method, such as checkpointing, retrying, replication. The results of the performance evaluation show that this framework is scalable, high-efficiency and low-overhead.
出处 《华中科技大学学报(自然科学版)》 EI CAS CSCD 北大核心 2006年第7期42-45,共4页 Journal of Huazhong University of Science and Technology(Natural Science Edition)
基金 国家自然科学基金重大专项基金资助项目(90412010) 中国教育科研网格计划ChinaGrid基金资助项目(CG2003-CG001).
关键词 错误检测 容错 基于策略的错误处理 fault detection fault-tolerance policy-based fault-handling
  • 相关文献

参考文献4

  • 1Chen W, Toueg S, Aguilera M K. On the quality of service of failure detectors[J]. IEEE Transactions on Computers, 2002, 51(2): 12-32.
  • 2Geist G A, Kohl J A, Papadopoulos P M. CUMUL-VS.. providing fault-tolerance, visualization and steering of parallel applications[J]. International Journal of High Performance Computing Applications, 1997,11(3) : 224-236.
  • 3Frey J, Tannenbaum T, Foster I, et al. Condorg: acomputation management agent for multi-institutional grids[J]. Cluster Computing, 2002, 5(3): 237-246.
  • 4邹德清,金海,吴松,石宣化,羌卫中.面向网格的协作式网络计算平台[J].计算机学报,2004,27(12):1617-1625. 被引量:11

二级参考文献3

  • 1Foster I., Kesselman C.. Globus: A metacomputing infrastructure toolkit. International Journal of Supercomputer Applications, 1997, 11(2): 115~128
  • 2Foster I., Kesselman C., Tuecke S.. The anatomy of the grid. International Journal of Supercomputer Applications, 2001, 5(3): 200~222
  • 3Fedak G., Germain C., Neri V., Cappello F.. XtremWeb: A generic global computing system. In: Proceedings of the 1st IEEE/ACM International Symposium on Cluster Computing and the Grid, 2001, 582~87

共引文献10

同被引文献35

  • 1邝坪,金海,袁平鹏,陈汉华.基于OGSA的网格服务容错框架的研究和应用[J].华中科技大学学报(自然科学版),2005,33(z1):25-28. 被引量:2
  • 2罗红,慕德俊,邓智群,王晓东.网格计算中任务调度研究综述[J].计算机应用研究,2005,22(5):16-19. 被引量:61
  • 3吴毓毅,贺也平.关于网格计算授权机制的研究[J].计算机应用研究,2005,22(8):81-83. 被引量:6
  • 4邱敏,桂小林.实现可靠计算的容错网格结构[J].微电子学与计算机,2005,22(7):99-102. 被引量:7
  • 5张伟哲,刘欣然,云晓春,张宏莉,胡铭曾,刘凯鹏.信任驱动的网格作业调度算法[J].通信学报,2006,27(2):73-79. 被引量:33
  • 6Huang S, Kesselman C. A flexible framework for fault tolerance in the grid[J].Journal of Grid Computing,2003,1 (3) :251-272.
  • 7Azzed IN F, Maheswaran M. Integrating trust into grid resource management systems[ A]. Proc of International Conference on Parallel Processing[ C]. LosAlamitos: IEEE Computer Society Press, 2002,47 -54.
  • 8Resn ICK P, Zeckhauser R, FR Iedman E, et al. Reputation systems[ J]. Communications of the ACM, 2000, 43(12) :45-48.
  • 9Hwang S, Kesselman C. Grid workflow: a flexible framework for fault tolerance in the grid[ D]. Ph. D Dissertation of Southern California University, 2003 ( 8 ) : 88-98.
  • 10DAI Zhi-hui. A lightweight grid middleware based on OPENSSH-SCE [ C]//Proc of the 6th International Conference on Grid and Cooperative Computing. Washington DC : IEEE Computer Society, 2007 : 387-392.

引证文献4

二级引证文献6

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部