期刊文献+

Spark性能优化技术研究综述 被引量:21

Survey on Performance Optimization Technologies for Spark
下载PDF
导出
摘要 近年来,随着大数据时代的到来,大数据处理平台发展迅速,产生了诸如Hadoop,Spark,Storm等优秀的大数据处理平台,其中Spark最为突出。随着Spark在国内外的广泛应用,其许多性能问题尚待解决。由于Spark底层的执行机制极为复杂,用户很难找到其性能瓶颈,更不要说进一步的优化。针对以上问题,从开发原则优化、内存优化、配置参数优化、调度优化、Shuffle过程优化5个方面对目前国内外的Spark优化技术进行总结和分析。最后,总结了目前Spark优化技术新的核心问题,并提出了未来的主要研究方向。 In recent years,with the advent of the era of big data,big data processing platform is developing very fast.A large number of big data processing platforms,including Hadoop,Spark,Strom and etc.,have appeared,among which Apache Spark is the most prominent one.With the wide applications of Spark at home and abroad,there are many performance problems to be solved.As the underlying implementation mechanism of Spark is very complex,it is difficult for ordinary users to find performance bottlenecks,let alone further optimization.In light of the above problems,the performance optimization technologies for Spark were summarized and analyzed from five aspects,including development principles optimization,memory optimization,configuration parameter optimization,scheduling optimization and shuffle process optimization.Finally,the key problems of Spark optimization technologies were summarized and future research issues were proposed.
作者 廖湖声 黄珊珊 徐俊刚 刘仁峰 LIAO Hu-sheng;HUANG Shan-shan;XU Jun-gang;LIU Ren-feng(Faculty of Information Technology,Beijing University of Technology,Beijing 100124,China;School of Computer and Control Engineering,University of Chinese Academy of Sciences,Beijing 101408,China)
出处 《计算机科学》 CSCD 北大核心 2018年第7期7-15,37,共10页 Computer Science
基金 国家自然科学基金项目:云中并行程序性能分析方法研究(61372171)资助
关键词 SPARK 开发原则优化 参数优化 内存优化 调度优化 Shuffle过程优化 Spark Development principle optimization Configuration parameter optimization Memory optimization Scheduling optimization Shuffle process optimization
  • 相关文献

参考文献5

二级参考文献63

  • 1崔杰,李陶深,兰红星.基于Hadoop的海量数据存储平台设计与开发[J].计算机研究与发展,2012,49(S1):12-18. 被引量:141
  • 2ZAHARIA M,CHOWDHURY M,FRANKLIN M J,et al.Spark:cluster computing with working sets[C]//HotCloud2010.USENIX Association Berkeley,CA:[s.n.],2010:10-10.
  • 3Spark[OL].http://spark.apache.org/.
  • 4Shark[OL].http://shark.cs.berkeley.edu/.
  • 5Spark SQL[OL].http://spark.apache.org/sql/.
  • 6BLANAS S,PATEL J M,ERCEGOVAC V,et al.A comparison of join algorithms for log processing in MaPreduce[C]//SIGMOD2010.New York:ACM,2010:975-986.
  • 7SAKR S,ANNALIU,FAYOUMI A G.The Family of MapReduce and Large-Scale Data Processing Systems[J].ACM Computing Surveys (CSUR),2013,46(1).
  • 8KARGER D,LEHMAN E,LEIGHTON T,et al.Consistent hashing and random trees:distributed caching protocols for relieving hot spots on the world wide Web[C]//STOC97.New York:ACM,1997:654-663.
  • 9DECANDIA G,HASTORUN D,JAMPANI M,et al.Dynamo:Amazon's highly available key-value Store[C]//SOSP2007.New York:ACM,2007:205-220.
  • 10XIN R S,ROSEN J,ZAHARIA M,et al.Shark:SQL and rich analytics at scale[C]//SIGMOD2013.New York:ACM,2013:13-24.

共引文献53

同被引文献140

引证文献21

二级引证文献47

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部