Fault Tolerance and Recovery for Group Communication Services in Distributed Networks 被引量：1

Fault Tolerance and Recovery for Group Communication Services in Distributed Networks

导出

摘要 Group communication services （GCSs） are becoming increasingly important as a wide field of promising applications has emerged to serve millions of users distributed across the world.However,it is challenging to make the service fault tolerance and scalable to fulfill the voluminous demand of users in a distributed network （DN）.While many reliable group communication protocols have been dedicated to addressing such a challenge so as to accommodate the changes in the network,they are often costly or require complicated strategies to handle the service interruptions caused by node departures or link failures,which hinders the service practicability.In this paper,we present two schemes to address the challenges.The first one is a location-aware replication scheme called NS,which makes replicas in a dispersed fashion that enables the services on nodes to gain immunity of failures with different patterns （e.g.,network partition and single point failure） while keeping replication overhead low.The second one is a novel failure recovery scheme that exploits the independence between service recovery and structure recovery in time domain to achieve quick failure recovery.Our simulation results indicate that the two proposed schemes outperform the existing schemes and simple alternative schemes in service success rate,recovery latency,and communication cost. Group communication services （GCSs） are becoming increasingly important as a wide field of promising applications has emerged to serve millions of users distributed across the world.However,it is challenging to make the service fault tolerance and scalable to fulfill the voluminous demand of users in a distributed network （DN）.While many reliable group communication protocols have been dedicated to addressing such a challenge so as to accommodate the changes in the network,they are often costly or require complicated strategies to handle the service interruptions caused by node departures or link failures,which hinders the service practicability.In this paper,we present two schemes to address the challenges.The first one is a location-aware replication scheme called NS,which makes replicas in a dispersed fashion that enables the services on nodes to gain immunity of failures with different patterns （e.g.,network partition and single point failure） while keeping replication overhead low.The second one is a novel failure recovery scheme that exploits the independence between service recovery and structure recovery in time domain to achieve quick failure recovery.Our simulation results indicate that the two proposed schemes outperform the existing schemes and simple alternative schemes in service success rate,recovery latency,and communication cost.

作者王跃华周忠 Ling Liu 吴威

机构地区 State Key Laboratory of Virtual Reality Technology and Systems School of Computer Science and Engineering College of Computing

出处《Journal of Computer Science & Technology》 SCIE EI CSCD 2012年第2期298-312,共15页 计算机科学技术学报（英文版）

基金 supported by National Science Foundation (NSF) grant from CISE NetSE Program and CyberTrust Cross-Cutting Program of USA,IBM faculty award IBM SUR grant,grant from Intel Research Council the National Basic Research 973 Program of China under Grant No. 2009CB320805 the National Natural Science Foundation of China under Grant No. 61170188 the National High Technology Research and Development 863 Program of China under Grant No. 2012AA011803 Fundamental Research Funds for the Central Universities of China supported by China Scholarship Council (CSC)

关键词 fault tolerance failure recovery REPLICATION LOCATION group communication fault tolerance,failure recovery,replication,location,group communication

分类号 TP393.09 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献36

1Chu Y, Rao S G, Seshan S, Zhang H. A case for end system multicast. IEEE Journal on Selected Areas in Communications, 2002, 20(8): 1456-1471.
2Castro M, Druschel P, Kermarrec A M, Rowstron A I T. SCRIBE: A large-scale and decentralized application-level multicast infrastructure. IEEE Journal on Selected Areas in Communications, 2002, 20(8): 1489-1499.
3Chawathe Y. Scattercast: An adaptable broadcast distribution framework. Multimedia Systems, 2003, 9(1): 104-118.
4Francis P. Yoid: Extending the internet multicast architecture. http://www.aciri.org/yoid/docs/index.html. 2000.
5Banerjee S, Bhattacharjee B, Kommareddy C. Scalable application layer multicast. In Proc. SIOCOMM 2002, Pittsburgh, USA, Aug. 19-23, 2002, pp.205-217.
6Banerjee S, Kommareddy C, Kar K, Bhattacharjee B, Khuller S. OMNI: An efficient overlay multicast infrastructure for realtime applications. Computer Networks, 2006, 50(6): 826-841.
7Jannotti J, Gifford D, Johnson K, Kaashoek M et al. Overcast: Reliable multicasting with on overlay network. In Proc. OSDI2000, San Diego, USA, Oct. 23-25, 2000, pp.197-212.
8Zhang J, Liu L, Ramaswamy L, Pu C. PeerCast: Churnresilient end system multicast on heterogeneous overlay networks. Journal of Network and Computer Applications, 2008, 31(4): 821-850.
9Castro M, Druschel P, Kerrnarrec A, Nandi A, Rowstron A, Singh A. SplitStream: High-bandwidth multicast in cooperative environments. In Proc. SOSP 2003, Bolton Landing, USA, Oct. 19-22, 2003, pp.298-313.
10Kostic D, Rodriguez A, Albrecht J, Vahdat A. Bullet: High bandwidth data dissemination using an overlay mesh. ACM SlOOPS Operating Systems Review, 2003, 37(5): 282-297.

同被引文献2

1张朝昆,崔勇,唐翯翯,吴建平.软件定义网络(SDN)研究进展[J].软件学报,2015,26(1):62-81. 被引量：437
2毛健彪,卞洪飞,韩彪,李韬,孙志刚.PiBuffer:面向数据中心的OpenFlow流缓存管理模型[J].计算机学报,2016,39(6):1092-1104. 被引量：9

引证文献1

1张飞扬,胡顺仿,朱林全,李扬,邢镔.QKD网络中的认证组密钥协商协议设计[J].现代计算机,2021,27(7):14-19.

1杨磊,李臣龙,汪婧.基于社区信息的链接分析与预测研究[J].安徽工程大学学报,2015,30(2):60-63.
2彭劲杰.基于耦合度的三个分布问题研究[J].电脑知识与技术,2005(12):163-165.
3张晓,张西红,周开民.SQL Server数据复制技术研究[J].科学技术与工程,2006,6(14):2158-2160. 被引量：3
4林丽华.解压多卷压缩包也要自动删除[J].电脑迷,2012(3):78-78.
5李宏辉,刘林华,李海彬.基于用户分布和用户感知的无线网络规划方法探讨[J].互联网天地,2014(11):34-41. 被引量：2
6张振滨.利用微机磁盘划分结构恢复主引导扇区[J].计算机系统应用,1994,3(5):47-50.
7高明,陈正鸣,吕嘉.以用户为中心的异构数据集成方法[J].微处理机,2014,35(3):25-29.
8郭书军,刘钢.NETWARE建立网络分区错误的处理与分析[J].计算机时代,1998(1):29-29.
9中国互联网络发展状况第五次统计报告摘要(2000/1)[J].中国电子商务,2000(4):25-27.
10孤舟.Ghost多卷备份实录[J].电脑技术——Hello-IT,2004(8):26-26.

Journal of Computer Science & Technology

2012年第2期

浏览历史

内容加载中请稍等...

Fault Tolerance and Recovery for Group Communication Services in Distributed Networks 被引量：1

参考文献36

同被引文献2

引证文献1

相关作者

相关机构

相关主题

浏览历史