基于数据库的机群检查点的研究与实现被引量：2

Research and Implementation of Cluster Computing Based on Database

下载PDF

导出

摘要本文提出一种新的应用级机群检查点实现方案 .给出了与现有方案不同的方法 :首先 ,采用关系数据库系统来代替以前采用文件的方式来存储机群系统的检查点、管理数据、资源情况等信息 ,便于数据的索引与归一化 ,并且 ,当数据规模非常大时 ,数据库支持的访问速度要优于基于文件系统的访问速度 .其次 ,采用独立的服务器 ,使得这些检查点以及其他相关操作对机群系统本身的运算影响最小 ,并且对这个独立的管理服务器作镜像容错处理。 This paper issues a new application level checkpoint system for cluster computing. It stores checkpoint, management data and resource information in database system, it has the following advantages: using index structure, database provides faster access than file system. Transactions about checkpoint system less affect the performance of the cluster because they could occur on a separate database server. For fault tolerance, to mirror the standalone database server is better than to mirror every computer node in cluster system.

作者武剑锋戈弋李三立

机构地区清华大学计算机科学与技术系

出处《小型微型计算机系统》 CSCD 北大核心 2002年第3期257-261,共5页 Journal of Chinese Computer Systems

基金教育部项目"先进计算基础设施 ACI"资助国家攀登计划研究项目"高性能计算机若干技术关键问题研究"资助

关键词数据库系统性能分析工具任务迁移机群检查点 cluster checkpoint RDBMS performance tools WEB GUI task migration

分类号 TP311.13 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献4

1Plank, J., Beck, M., Kingsley, G., Li, K., Libckpt: Transparent checkpointing under Unix[C]. USENIX Winter Technical Conference, 1995
2Jim Pruyne, Miron Livny. Managing checkpoints for parallel programs[C]. Workshop on Job Scheduling Strategies for Parallel Processing IPPS 96
3Volker Strumpen, Balkrishna Ramkumar. Portable checkpointing for heterogeneous architectures[A]. In Fault-Tolerant Parallel and Distributed Systems[M]. Eds. Dimiter R. Avresky and David R. Kaeli, chapter 4, 73～92, Kluwer Academic Press, 1998.
4Igor Lyubashevskiy, Volker Strumpen, Compiler-supported portable, fault-tolerant file-I/O[J]. In IEEE International Workshop on Embedded Fault-Tolerant Systems, Boston, MA, May 1998.49～56.

同被引文献17

1Birman K P.Building Secure and Reliable Network Applications[M].Englewood Diffs,NJ:Prentice Hall,1996.
2Jim Gray.Why do computers stop and what can be done about it?TR-85.7[R].Palo Alto,USA:Hewlett Packard Labs,1995.
3江滢.曙光4000A基础件应用管理详细设计报告,TR-DWARE[R].北京:中国科学院计算技术研究所国家智能计算机研究与开发中心,2006.
4Krishnan S,Gannon D.Checkpoint and restart for distributed components in XCAT3[C]//Proc of the 5th IEEE/ACM Int Workshop on Grid Computing.Los Alamitos,CA:IEEE Computer Society,2004:46-59.
5Eilam T,Kalantar M.Model-based automation of service deployment in a constrained environment,TR-345.1[R].Yorktown Heishts,NY,USA:IBM Watson Research Center,2004.
6Gao Wen,Chen Mingyu.A faster checkpointing and recovery algorithm with a hierarchical storage approach[C]//Proc of the 8th Int Conf on High-Performance Computing in Asia-Pacific Region.Los Alamitos,CA:IEEE Computer Society,2005:78-86.
7Nakamura H,Hayashida T.Skewed checkpointing for tolerating multi-node failures[C]//Proc of the 23rd IEEE Int Symp on Reliable Distributed Systems.Los Alamitos,CA:IEEE Computer Society,2004:67-79.
8Gribble S D.A design framework and a scalable storage platform to simplify Internet service construction[D].Berkeley,California:University of California,Berkeley,2000.
9Zhan Jianfeng,Sun Ninghui.Fire Phoenix cluster operating system kernel and its evaluation[C]//Proc of IEEE Int Conf on Cluster Computing 2005.Los Alamitos,CA:IEEE Computer Society,2005:98-112.
10Coulouris George,Dollimore Jean,Kindberg Tim.Distributed Systems:Concepts and Design[M].4th ed.Reading,MA:Addison Wesley,2008.

引证文献2

1秦航,徐婕.一种新的文件系统元数据的检查点容错策略[J].计算机工程与设计,2004,25(3):334-336. 被引量：3
2梁毅,王磊,樊建平,方娟.基于共享内存的机群服务检查点机制研究[J].计算机研究与发展,2010,47(4):571-580. 被引量：3

二级引证文献6

1罗秋明,欧阳凯.PVFS元数据服务器的并行化设计与实现[J].计算机工程,2006,32(12):50-51. 被引量：1
2马俊,杨树军.高性能集群文件系统的研究[J].计算机工程与设计,2006,27(13):2400-2402. 被引量：1
3居锦武,王兰英.NTFS文件系统剖析[J].计算机工程与设计,2007,28(22):5437-5439. 被引量：14
4郭锐锋,刘娴,丁万夫.基于优先级降低策略的回卷恢复容错实时调度算法研究[J].电子与信息学报,2012,34(2):474-480. 被引量：1
5刘娴,郭锐锋,丁万夫.基于优先级混合策略的回卷恢复容错实时系统的可调度性[J].吉林大学学报（工学版）,2012,42(5):1243-1250. 被引量：3
6郭锐锋,刘娴,丁万夫,李杰,王鸿亮.回卷恢复模型下容错实时系统的可调度性分析[J].小型微型计算机系统,2013,34(6):1334-1338. 被引量：2

1梁毅,邸瑞华,王磊,詹剑锋.面向机群检查点的组管理协议[J].北京工业大学学报,2010,36(6):833-839.

小型微型计算机系统

2002年第3期

浏览历史

内容加载中请稍等...

基于数据库的机群检查点的研究与实现被引量：2

参考文献4

同被引文献17

引证文献2

二级引证文献6

相关作者

相关机构

相关主题

浏览历史

基于数据库的机群检查点的研究与实现 被引量：2

参考文献4

同被引文献17

引证文献2

二级引证文献6

相关作者

相关机构

相关主题

浏览历史

基于数据库的机群检查点的研究与实现被引量：2