基于异步时钟机群监测系统的设计与实现

Design and Implementation of a Cluster Monitor System Based on Asynchronous Clock

下载PDF

导出

摘要机群监测系统是用来管理机群,方便用户使用机群的软件系统.针对当前已有的机群管理系统在时效性、健壮性等方面的不足,提出了一种基于异步时钟的服务器监测技术,通过把指定的服务器集合组成一个具有反馈机制的环形队列的方法,使得用户能够将机群作为一个整体来进行管理.该系统能够透明地加入和删除服务器节点,自动重新配置以达到高可用性.本文采用的线程池技术和I/O多路转换技术能够有效提高机群监测系统的反应时间.实际的测试结果证明,该系统能够根据传输延迟、丢包率、系统繁忙程度、服务器或网络故障等情况采取适当的对策,在短时间内发现和排除故障,可以较好地应用于邮件服务器、WEB服务器等事务性机群处理系统中. Cluster monitor system is a management software system, which makes the control of the cluster system very easy. At current time, most of the cluster monitor systems have some faults in real-time and robust. In this paper, a new mechanism to improve the cluster monitor system based on asynchronous clock is proposed. It connects the specified servers into a feed- back queue, so the users can treat the cluster system as one. The monitor system can add or delete certain nodes in the cluster system transparently. Furthermore, the new monitor system can reconfigure automatically to achieve high availability. In order to reduce the latency of the cluster monitor system, two technologies are used. One is the thread poll, the other is the I/O multiplexing. The results show that the new cluster monitor system can find an appropriate way to deal with the situations such as latency, ratio of the data losing, CPU utilization, fail of the server or network. The new cluster monitor system is very suited for the cluster system such as mail server, WEB server or other businesslike systems.

作者刘广涛舒继武郑纬民

机构地区清华大学计算机科学与技术系

出处《小型微型计算机系统》 CSCD 北大核心 2005年第9期1617-1620,共4页 Journal of Chinese Computer Systems

基金国家"八六三"重点项目(2001AA11110 2004AA111120)资助.

关键词机群异步时钟成员协议高可用远程过程调用 cluster t asynchronous clock membership protocol credibility t remote procedure call

分类号 TP393 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献8

1Roberto Baldoni, Fabio Zito. Designing a service of failure detection in asynchronous distributed systems[C]. Fourth International Symposium on Object-Oriented Real-Time Distributed Computing, May 02 - 04, 2001: 113-120.
2Antonio Casimir, Pedro Martins, Luis Rodrigues et al. Measuring distributed durations with stable errors [J]. Real-Time Systems Symposium, 2001 Dec 3-6 22nd IEEE: 310-320.
3Flaviu Cristian. Frank Sehmuek. Agreeing on processor group membership in asynehronous distributed systems[R]. University of California, San Diego, 1995, Technical Report CSE95-428.
4Christof Fetzer. Perfect failure detection in timed asynchronous systems[J]. IEEE Transactions on Computer, 2003, 52(2):99-112.
5Tushar Deepak Chandra, Sam Toueg. Unreliable failure detectors for asynchronous systems [C]. Proceedings of the Tenth Annual ACM Symposium on Principles of Distributed Computing, Aug 1991:325-340.
6Herman T. Phase clocks for transient fault repair parallel and distributed systems[J]. IEEE Transactions on Parallel and Distributed Systems, Oct. 2000,11 (10) : 1048-1057.
7Leslie Lamport. Time, clocks and the ording of event in a distributed system[J]. Communications of the ACM, 1978, 21(7):558-565.
8熊劲,孙凝晖.曙光机群资源管理的设计与实现[J].计算机学报,2002,25(12):1357-1363. 被引量：8

二级参考文献7

1Intel Corporation,Paragon User's Guide,1993
2A1 Geist,Adam Beguelin et al. PVM3 User's Guide And Ref erence Manual, 1994
3IBM Corporation. IBM Parallel System Support Programs forAIX: Administration Guide. 2nd Edition, IBM SP Redbook, 1996
4Goscinski A. Distributed Operating Systems: The Logical Design. Addison-Wesley Publishing Company, 1991
5Dawning3000 Administrator's Guide. National Research Center for Intelligent Computing Systems, P. R. C. , 2001(in Chinese)(曙光3000系统管理员手册.国家智能计算机研究开发中心,内部资料,2001)
6Dawning3000 User's Guide. National Research Center for In telligent Computing Systems, P. R. C. , 2001(in Chinese)(曙光3000用户手册.国家智能计算机研究开发中心,内部资料,2001)
7孙凝晖,徐志伟.曙光2000超级计算机系统软件的设计[J].计算机学报,2000,23(1):9-20. 被引量：11

共引文献7

1许丽娟,徐炜民,苏蕊.基于多级资源池的负载平衡系统的设计与实现[J].计算机工程与设计,2006,27(2):216-219. 被引量：2
2梁泉,梁开健,杨扬.计算网格服务分配的QoS优化研究[J].计算机应用研究,2007,24(3):44-46.
3向建军,许蕴山,夏海宝.基于中间件技术的实时集群计算系统的研究[J].微电子学与计算机,2008,25(11):13-16.
4樊华,沈锐,王戟.虚拟计算环境中基于资源池的资源聚合机制[J].计算机工程与科学,2009,31(3):122-127. 被引量：6
5容晓峰,周利华.密码服务器运算资源可扩展管理研究[J].电信科学,2010,26(8):86-89. 被引量：1
6付喜春.基于多架构PC集群监控系统结构设计[J].信息技术,2013,37(9):96-99. 被引量：6
7吕方,崔慧敏,霍玮,冯晓兵.面向并发性能下降的调度策略的综述[J].计算机研究与发展,2014,51(1):17-30. 被引量：4

1林一帆,曾晓洋,陈俊,吴敏,龚铭.一种基于流水线结构的双时钟域数据交换技术[J].计算机工程,2007,33(10):243-245.
2黎煊,吴晓蓓,胡维礼,樊卫华.具有长时延和异步时钟的网络控制系统的故障检测[J].南京理工大学学报,2009,33(2):172-177. 被引量：4
3姜晶菲,唐玉华,崔向东.可重构多路仲裁器[J].计算机工程与设计,2009,30(1):1-3. 被引量：1
4张伟,周航军,彭宇行,李思昆.分布式交互仿真中的异步时钟一致性控制方法[J].软件学报,2010,21(6):1208-1219. 被引量：7
5黎煊,吴晓蓓.具有长时延和丢包的网络控制系统的故障检测[J].计算机工程与应用,2008,44(33):221-223. 被引量：4
6胡修林,李喜林,唐祖平.基于嵌入式多任务系统的串口通信[J].单片机与嵌入式系统应用,2006,6(6):24-27. 被引量：8
7黄明和.环形队列扦入删除算法分析及其改进[J].江西师范大学学报（自然科学版）,1996,20(2):148-151. 被引量：4
8李佩斌,黄莹,赵誉婷.基于DSP+FPGA的嵌入式图像处理系统设计[J].现代电子技术,2014,37(20):95-98. 被引量：9
9高文辉,胥志毅,邬天恺,刘文江,仲景尼.异步时钟亚稳态仿真方法[J].信息技术,2012,36(10):167-169. 被引量：1
10片上网络取代片上总线[J].电子设计技术 EDN CHINA,2005,12(10):15-15.

小型微型计算机系统

2005年第9期

浏览历史

内容加载中请稍等...

基于异步时钟机群监测系统的设计与实现

参考文献8

二级参考文献7

共引文献7

相关作者

相关机构

相关主题

浏览历史