期刊文献+

Fault—Tolerant Grid Architecture and Practice 被引量:4

原文传递
导出
摘要 Grid computing emerges as effective technologies to couple geographically distributed resources and solve large-scale computational problems in wide area networks. The fault tolerance is a significant and complex issue in grid computing systems. Various techniques have been investigated to detect and correct faults in distributed computing systems. Unreliable fault detection is one of the most effective techniques. Globus as a grid middleware manages resources in a wide area network. The Globus fault detection service uses the well-known techniques based on unreliable fault detectors to detect and report component failures. However, more powerful techniques are required to detect and correct both system-level and application-level faults in a grid system, and a convenient toolkit is also needed to maintain the consistency in the grid. A fault-tolerant grid platform (FTGP) based on an unreliable fault detector and the Globus fault detection service is presented in this paper. The platform offers effective strategies in such three aspects as grid key components, user tasks, and high-level applications.
出处 《Journal of Computer Science & Technology》 SCIE EI CSCD 2003年第4期423-433,共11页 计算机科学技术学报(英文版)
基金 国家自然科学基金
  • 相关文献

参考文献22

  • 1Stallings W. SNMP and SNMPv2: The infrastructure for network management. IEEE Communications Magazine, Mar., 1998, 36(3): 37-43.
  • 2Armstrong R, Gannon D, Geist A et al. Toward a common component architecture for high performance scientific computing. In Proc. the 8th IEEE Symposium on High Performance Distributed Computing, Redondo Beach, CA, Aug., 1999, pp.115-124.
  • 3Dongarra J. An overview of computational grids and survey of a few research projects. In Proc. Symposium on Global Information Processing Technology, Tokyo,Japan. 1999.
  • 4Johnston W E, Gannon D, Nitzberg B. Grids as production computing environments: The engineering aspects of NASA's information power Grid. In Proc. the 8th IEEE Symposium on High Performance Distributed Computing, Redondo Beach, CA, 1999, pp.197-204.
  • 5Angulo D, Aydt R, Berman F et al. Toward a framework for preparing and executing adaptive Grid programs. In Proc. IPDPS'02, Fort Lauderdale, FL, 2002,pp.171-175.
  • 6Chandra T D, Toueg S. Unreliable failure detectors for reliable distributed systems. Journal of the A CM, Mar.1996, 43(2): 225-267.
  • 7Foster I, Kesselman C. Globus: A metacomputing infrastructure toolkit. International Journal of Supercomputer Applications, 1997, 11(2): 115-128.
  • 8Stelling P, DeMatteis C, Foster I et al. A fault detection service for wide area distributed computations. Cluater Computing, 1999, 2: 117-128.
  • 9Eugster P T, Guerraoui R, Handurukande S et al.Lightweight probabilistic broadcast. In Proc. the 2001 IEEE International Conference on Dependable Systems and Networks, San Franciso, CA, June, 2001, pp.443-452.
  • 10Guerraoui R, Schiper A. Genuine atomic multicast in asynchronous distributed systems. Theoretical Computer Science, Mar. 2001, 254(1-2): 297-316.

同被引文献34

引证文献4

二级引证文献20

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部