期刊文献+

基于数据库的机群检查点的研究与实现 被引量:2

Research and Implementation of Cluster Computing Based on Database
下载PDF
导出
摘要 本文提出一种新的应用级机群检查点实现方案 .给出了与现有方案不同的方法 :首先 ,采用关系数据库系统来代替以前采用文件的方式来存储机群系统的检查点、管理数据、资源情况等信息 ,便于数据的索引与归一化 ,并且 ,当数据规模非常大时 ,数据库支持的访问速度要优于基于文件系统的访问速度 .其次 ,采用独立的服务器 ,使得这些检查点以及其他相关操作对机群系统本身的运算影响最小 ,并且对这个独立的管理服务器作镜像容错处理 。 This paper issues a new application level checkpoint system for cluster computing. It stores checkpoint, management data and resource information in database system, it has the following advantages: using index structure, database provides faster access than file system. Transactions about checkpoint system less affect the performance of the cluster because they could occur on a separate database server. For fault tolerance, to mirror the standalone database server is better than to mirror every computer node in cluster system.
出处 《小型微型计算机系统》 CSCD 北大核心 2002年第3期257-261,共5页 Journal of Chinese Computer Systems
基金 教育部项目"先进计算基础设施 ACI"资助 国家攀登计划研究项目"高性能计算机若干技术关键问题研究"资助
关键词 数据库系统 性能分析工具 任务迁移 机群检查点 cluster checkpoint RDBMS performance tools WEB GUI task migration
  • 相关文献

参考文献4

  • 1Plank, J., Beck, M., Kingsley, G., Li, K., Libckpt: Transparent checkpointing under Unix[C]. USENIX Winter Technical Conference, 1995
  • 2Jim Pruyne, Miron Livny. Managing checkpoints for parallel programs[C]. Workshop on Job Scheduling Strategies for Parallel Processing IPPS 96
  • 3Volker Strumpen, Balkrishna Ramkumar. Portable checkpointing for heterogeneous architectures[A]. In Fault-Tolerant Parallel and Distributed Systems[M]. Eds. Dimiter R. Avresky and David R. Kaeli, chapter 4, 73~92, Kluwer Academic Press, 1998.
  • 4Igor Lyubashevskiy, Volker Strumpen, Compiler-supported portable, fault-tolerant file-I/O[J]. In IEEE International Workshop on Embedded Fault-Tolerant Systems, Boston, MA, May 1998.49~56.

同被引文献17

  • 1Birman K P.Building Secure and Reliable Network Applications[M].Englewood Diffs,NJ:Prentice Hall,1996.
  • 2Jim Gray.Why do computers stop and what can be done about it?TR-85.7[R].Palo Alto,USA:Hewlett Packard Labs,1995.
  • 3江滢.曙光4000A基础件应用管理详细设计报告,TR-DWARE[R].北京:中国科学院计算技术研究所国家智能计算机研究与开发中心,2006.
  • 4Krishnan S,Gannon D.Checkpoint and restart for distributed components in XCAT3[C]//Proc of the 5th IEEE/ACM Int Workshop on Grid Computing.Los Alamitos,CA:IEEE Computer Society,2004:46-59.
  • 5Eilam T,Kalantar M.Model-based automation of service deployment in a constrained environment,TR-345.1[R].Yorktown Heishts,NY,USA:IBM Watson Research Center,2004.
  • 6Gao Wen,Chen Mingyu.A faster checkpointing and recovery algorithm with a hierarchical storage approach[C]//Proc of the 8th Int Conf on High-Performance Computing in Asia-Pacific Region.Los Alamitos,CA:IEEE Computer Society,2005:78-86.
  • 7Nakamura H,Hayashida T.Skewed checkpointing for tolerating multi-node failures[C]//Proc of the 23rd IEEE Int Symp on Reliable Distributed Systems.Los Alamitos,CA:IEEE Computer Society,2004:67-79.
  • 8Gribble S D.A design framework and a scalable storage platform to simplify Internet service construction[D].Berkeley,California:University of California,Berkeley,2000.
  • 9Zhan Jianfeng,Sun Ninghui.Fire Phoenix cluster operating system kernel and its evaluation[C]//Proc of IEEE Int Conf on Cluster Computing 2005.Los Alamitos,CA:IEEE Computer Society,2005:98-112.
  • 10Coulouris George,Dollimore Jean,Kindberg Tim.Distributed Systems:Concepts and Design[M].4th ed.Reading,MA:Addison Wesley,2008.

引证文献2

二级引证文献6

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部