期刊文献+

考虑节点失效恢复能力的网格服务可靠性建模与分析 被引量:7

Modeling and Analysis for Grid Service Reliability Considering Node Recovery
下载PDF
导出
摘要 针对需要较长执行时间和占用较多计算机资源的"大任务",其节点可靠性随着时间的增加而呈指数衰减的问题,给出了一种考虑节点失效恢复能力的网格服务可靠性模型.在星型网格系统的网格服务可靠性分析中引入了节点失效恢复机制,并考虑了节点软件可靠性对网格服务可靠性的影响,同时采用子任务并行处理和子任务冗余方法提高服务的可靠性.仿真结果验证了引入节点失效恢复机制对提高网格服务可靠性的积极影响,为解决"大任务"可靠性偏低的问题提供了一种有效的解决方法. A reliability model for grid services that considers the fault recovery is presented to solve the problem that the grid service reliability decreases at exponential as the time increases, especially for some large subtasks that need long-lived computations and long-terms data storage. The ability of recovery into the grid nodes is introduced in grid systems with star topology, and the influence of software reliability is also taken into account. In order to improve the reliability of grid services, the grid services are divided into subtasks and then are assigned to different resources for processing in the proposed model. Numerical example is given to show that the recovery has a positive influence on grid service reliability and provides an effective solution to the fault tolerant for the services which consist of some large subtasks.
出处 《西安交通大学学报》 EI CAS CSCD 北大核心 2008年第6期693-697,790,共6页 Journal of Xi'an Jiaotong University
基金 国家自然科学基金资助项目(59685003) 高等学校全国优秀博士学位论文专项基金资助项目(200232)
关键词 网格 服务可靠性 节点恢复 grid service reliability node recovery
  • 相关文献

参考文献11

  • 1FOSTER I,KESSELMAN C. The grid 2: blueprint for a new computing infrastructure [M]. San Francisco, CA, USA, Morgan-Kaufmann, 2004.
  • 2FOSTER I. The grid: a new infrastructure for 21st century science [J].Physics Today, 2002, 55(2):42- 47.
  • 3DAI Yuansun, XIE Min, POH K L. Reliability of grid service systems [J].Computers and Industrial Engineering, 2006, 50(1/2) : 130-147.
  • 4LEVITIN G, DAI Yuansun. Service reliability and performance in grid system with star topology [J].Reliability Engineering and System Safety, 2007, 92 (1) :40-46.
  • 5DAI Yuansun, LEVITIN G, WANG Xiaolong. Optimal task partition and distribution in grid service system with common cause failures [J]. Future Generation Computer Systems, 2007, 23(2) :209-218.
  • 6YANG Bo, XIE Min. A study of operational and testing reliability in software reliability analysis [J].Reliability Engineering and System Safety, 2000, 70(3): 323-329.
  • 7JOZSEF K, PETER K. A migration framework for executing parallel programs in the grid [C]// Proceedings of 2nd European Across Grids Conference. Berlin, Germany: Springer-Verlag, 2004 : 80-89.
  • 8邱敏,桂小林.实现可靠计算的容错网格结构[J].微电子学与计算机,2005,22(7):99-102. 被引量:7
  • 9HEDDAYA A, HELAL A. Reliability, availability, dependability and performability: a user-centered view [EB/OL]. [2006-11-08]. http://www. cs. bu. edu/ techreports/pdf/1997-011-reliability-def.pdf.
  • 10XIE Min. Software reliability modeling [M]. Singapore:World Scientific Publishing Company, 1991.

二级参考文献13

  • 1Tevfik Kosar, George Kola, Miron Livny. A Framework for Self-optimizing, Fault-tolerant, High Performance Bulk Data Transfers in a Heterogeneous Grid Environment, Parallel and Distributed Computing. In proceedings on second international symposium, 2003.
  • 2.[EB/OL].www.globus.org/hbm,.
  • 3Olivia Das, C M Woodside. Failure Detection and Recovery Modelling For Multi-layered Service Systems. Fifth international workshop on performability modeling of computer and communication systems.
  • 4Priya Narasimhan, Austin Fath, Chuck Fox, et al. A Distributed Fault-tolerant Architecture. www.ece.cmu.edu.
  • 5Luc Moreau. A Fault-Tolerant Directory Service for Mobile Agents based on Forwarding Pointers. Proceedings of the 17th ACM, SAC 2002.
  • 6Deqing Zou, Hai Jin, Hanhua Chen. Fault-Tolerant Grid Architecture and Practice. Journal of Computer Science and Technology, July 2003,18(4).
  • 7.[EB/OL].http://www.dice.inf.ed.ac.uk/groups/infrastructure/,.
  • 8D K Pradhan. Fault-Tolerant Computing: Theory and Techniques. Prentice Hall, 1995, 1.
  • 9Sun Sup So, Sung Deok Cha, Timothy J Shimeall, Yong Rae Kwon. An Empirical Evaluation of Six Methods to Detect Faults in Software. Jounal of Software Testing, Verification, and Reliability, May 2002, 12.
  • 10Paul Stelling, Ian Foster, Carl Kesselman, Craig Lee, Gregorvon Laszewski. A Fault Detection Service For Wide Area Distributed Computations. In proceedings of the 7th IEEE syrup on high performance distributed computing.

共引文献6

同被引文献122

引证文献7

二级引证文献27

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部