期刊文献+

Wide Area Analytics for Geographically Distributed Datacenters 被引量:1

Wide Area Analytics for Geographically Distributed Datacenters
原文传递
导出
摘要 Big data analytics, the process of organizing and analyzing data to get useful information, is one of the primary uses of cloud services today. Traditionally, collections of data are stored and processed in a single datacenter. As the volume of data grows at a tremendous rate, it is less efficient for only one datacenter to handle such large volumes of data from a performance point of view. Large cloud service providers are deploying datacenters geographically around the world for better performance and availability. A widely used approach for analytics of gee-distributed data is the centralized approach, which aggregates all the raw data from local datacenters to a central datacenter. However, it has been observed that this approach consumes a significant amount of bandwidth, leading to worse performance. A number of mechanisms have been proposed to achieve optimal performance when data analytics are performed over geo-distributed datacenters. In this paper, we present a survey on the representative mechanisms proposed in the literature for wide area analytics. We discuss basic ideas, present proposed architectures and mechanisms, and discuss several examples to illustrate existing work. We point out the limitations of these mechanisms, give comparisons, and conclude with our thoughts on future research directions. Big data analytics, the process of organizing and analyzing data to get useful information, is one of the primary uses of cloud services today. Traditionally, collections of data are stored and processed in a single datacenter. As the volume of data grows at a tremendous rate, it is less efficient for only one datacenter to handle such large volumes of data from a performance point of view. Large cloud service providers are deploying datacenters geographically around the world for better performance and availability. A widely used approach for analytics of gee-distributed data is the centralized approach, which aggregates all the raw data from local datacenters to a central datacenter. However, it has been observed that this approach consumes a significant amount of bandwidth, leading to worse performance. A number of mechanisms have been proposed to achieve optimal performance when data analytics are performed over geo-distributed datacenters. In this paper, we present a survey on the representative mechanisms proposed in the literature for wide area analytics. We discuss basic ideas, present proposed architectures and mechanisms, and discuss several examples to illustrate existing work. We point out the limitations of these mechanisms, give comparisons, and conclude with our thoughts on future research directions.
出处 《Tsinghua Science and Technology》 SCIE EI CAS CSCD 2016年第2期125-135,共11页 清华大学学报(自然科学版(英文版)
关键词 big data ANALYTICS geo-distributed datacenters big data analytics geo-distributed datacenters
  • 相关文献

参考文献16

  • 1K.Kloudas,M.Mamede,N.Preguic?a,and R.Rodrigues,Pixida:Optimizing data parallel jobs in bandwidth-skewed environments,VLDB Endowment,vol.9,no.2,pp.72–83,2015.
  • 2A.Vulimiri,C.Curino,P.Godfrey,T.Jungblut,J.Padhye,and G.Varghese,Global analytics in the face of bandwidth and regulatory constraints,in Proc.of USENIX Symposium on Networked Systems Design and Implementation(NSDI),2015.
  • 3K.Shvachko,H.Kuang,S.Radia,and R.Chansler,The hadoop distributed file system,in Proc.of IEEE on Mass Storage Systems and Technologies(MSST),2010.
  • 4J.Dean and S.Ghemawat,Map Reduce:Simplified data processing on large clusters,Communications of the ACM,vol.51,no.1,pp.107–113,2008.
  • 5M.Zaharia,M.Chowdhury,T.Das,A.Dave,J.Ma,M.Mc Cauley,M.J.Franklin,S.Shenker,and I.Stoica,Resilient distributed datasets:A fault-tolerant abstraction for in-memory cluster computing,in Proc.of USENIX Symposium on Networked Systems Design and Implementation(NSDI),2012.
  • 6M.Zaharia,T.Das,H.Li,T.Hunter,S.Shenker,and I.Stoica,Discretized streams:Fault-tolerant streaming computation at scale,in Proc.of the 24th ACM Symposium on Operating Systems Principles(SOSP),2013,pp.423–438.
  • 7R.Couto,S.Secci,M.Campista,and L.Costa,Latency versus survivability in geo-distributed data center design,in Proc.of IEEE Global Communications Conference(GLOBECOM),2014,pp.1102–1107.
  • 8Q.Zhang,L.Liu,K.Lee,Y.Zhou,A.Singh,N.Mandagere,S.Gopisetty,and G.Alatorre,Improving hadoop service provisioning in a geographically distributed cloud,in Proc.of the 7th IEEE International Conference on Cloud Computing,2014.
  • 9A.Munir,I.A.Qazi,and B.Qaisar,On achieving low latency in data centers,in Proc.of IEEE International Conference on Communications(ICC),2013,pp.3721–3725.
  • 10A.Rabkin,M.Arye,S.Sen,V.S.Pai,and M.J.Freedman,Aggregation and degradation in Jet Stream:Streaming analytics in the wide area,in Proc.of USENIX NSDI,2014.

同被引文献3

引证文献1

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部