期刊文献+

Fault Tolerance and Recovery for Group Communication Services in Distributed Networks 被引量:1

Fault Tolerance and Recovery for Group Communication Services in Distributed Networks
原文传递
导出
摘要 Group communication services (GCSs) are becoming increasingly important as a wide field of promising applications has emerged to serve millions of users distributed across the world.However,it is challenging to make the service fault tolerance and scalable to fulfill the voluminous demand of users in a distributed network (DN).While many reliable group communication protocols have been dedicated to addressing such a challenge so as to accommodate the changes in the network,they are often costly or require complicated strategies to handle the service interruptions caused by node departures or link failures,which hinders the service practicability.In this paper,we present two schemes to address the challenges.The first one is a location-aware replication scheme called NS,which makes replicas in a dispersed fashion that enables the services on nodes to gain immunity of failures with different patterns (e.g.,network partition and single point failure) while keeping replication overhead low.The second one is a novel failure recovery scheme that exploits the independence between service recovery and structure recovery in time domain to achieve quick failure recovery.Our simulation results indicate that the two proposed schemes outperform the existing schemes and simple alternative schemes in service success rate,recovery latency,and communication cost. Group communication services (GCSs) are becoming increasingly important as a wide field of promising applications has emerged to serve millions of users distributed across the world.However,it is challenging to make the service fault tolerance and scalable to fulfill the voluminous demand of users in a distributed network (DN).While many reliable group communication protocols have been dedicated to addressing such a challenge so as to accommodate the changes in the network,they are often costly or require complicated strategies to handle the service interruptions caused by node departures or link failures,which hinders the service practicability.In this paper,we present two schemes to address the challenges.The first one is a location-aware replication scheme called NS,which makes replicas in a dispersed fashion that enables the services on nodes to gain immunity of failures with different patterns (e.g.,network partition and single point failure) while keeping replication overhead low.The second one is a novel failure recovery scheme that exploits the independence between service recovery and structure recovery in time domain to achieve quick failure recovery.Our simulation results indicate that the two proposed schemes outperform the existing schemes and simple alternative schemes in service success rate,recovery latency,and communication cost.
出处 《Journal of Computer Science & Technology》 SCIE EI CSCD 2012年第2期298-312,共15页 计算机科学技术学报(英文版)
基金 supported by National Science Foundation (NSF) grant from CISE NetSE Program and CyberTrust Cross-Cutting Program of USA,IBM faculty award IBM SUR grant,grant from Intel Research Council the National Basic Research 973 Program of China under Grant No. 2009CB320805 the National Natural Science Foundation of China under Grant No. 61170188 the National High Technology Research and Development 863 Program of China under Grant No. 2012AA011803 Fundamental Research Funds for the Central Universities of China supported by China Scholarship Council (CSC)
关键词 fault tolerance failure recovery REPLICATION LOCATION group communication fault tolerance,failure recovery,replication,location,group communication
  • 相关文献

参考文献36

  • 1Chu Y, Rao S G, Seshan S, Zhang H. A case for end system multicast. IEEE Journal on Selected Areas in Communications, 2002, 20(8): 1456-1471.
  • 2Castro M, Druschel P, Kermarrec A M, Rowstron A I T. SCRIBE: A large-scale and decentralized application-level multicast infrastructure. IEEE Journal on Selected Areas in Communications, 2002, 20(8): 1489-1499.
  • 3Chawathe Y. Scattercast: An adaptable broadcast distribution framework. Multimedia Systems, 2003, 9(1): 104-118.
  • 4Francis P. Yoid: Extending the internet multicast architecture. http://www.aciri.org/yoid/docs/index.html. 2000.
  • 5Banerjee S, Bhattacharjee B, Kommareddy C. Scalable application layer multicast. In Proc. SIOCOMM 2002, Pittsburgh, USA, Aug. 19-23, 2002, pp.205-217.
  • 6Banerjee S, Kommareddy C, Kar K, Bhattacharjee B, Khuller S. OMNI: An efficient overlay multicast infrastructure for realtime applications. Computer Networks, 2006, 50(6): 826-841.
  • 7Jannotti J, Gifford D, Johnson K, Kaashoek M et al. Overcast: Reliable multicasting with on overlay network. In Proc. OSDI2000, San Diego, USA, Oct. 23-25, 2000, pp.197-212.
  • 8Zhang J, Liu L, Ramaswamy L, Pu C. PeerCast: Churnresilient end system multicast on heterogeneous overlay networks. Journal of Network and Computer Applications, 2008, 31(4): 821-850.
  • 9Castro M, Druschel P, Kerrnarrec A, Nandi A, Rowstron A, Singh A. SplitStream: High-bandwidth multicast in cooperative environments. In Proc. SOSP 2003, Bolton Landing, USA, Oct. 19-22, 2003, pp.298-313.
  • 10Kostic D, Rodriguez A, Albrecht J, Vahdat A. Bullet: High bandwidth data dissemination using an overlay mesh. ACM SlOOPS Operating Systems Review, 2003, 37(5): 282-297.

同被引文献2

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部