期刊文献+

Spark数据倾斜问题研究 被引量:3

Research on data skew of Spark
下载PDF
导出
摘要 当今人类已经进入大数据时代,海量数据处理已成为大数据技术领域的研究热点。Spark是一种典型的基于内存的分布式大数据处理框架,但在Spark的实际应用中出现的数据倾斜问题会对计算效率产生重要影响。本文针对于Spark在各类应用中出现的数据倾斜问题,梳理国内外相关研究进展,对在出现数据倾斜问题后常用的优化方法进行了分析对比,最后对未来的研究方向进行了展望。 Nowadays,mankind has entered the era of big data,and mass data processing has become a research hotspot in the field of big data technology.Spark is a typical memory-based distributed big data processing framework,but the data skew problem in the actual application of Spark will have an important impact on the computing efficiency.Aiming at the data skew problem in various applications of Spark,this paper sorted out relevant research progress at home and abroad,analyzed and compared the commonly used optimization methods after the occurrence of data skew problem,and finally looked into the future research direction.
作者 张占峰 王文礼 耿珊珊 贾芝婷 ZHANG Zhan-feng;WANG Wen-li;GENG Shan-shan;JIA Zhi-ting(College of Information Technology,Hebei University of Economics and Business,Shijiazhuang Hebei 050061,China)
出处 《河北省科学院学报》 CAS 2020年第1期1-7,共7页 Journal of The Hebei Academy of Sciences
基金 2019年度河北省研究生创新资助项目(CXZZSS2019106)。
关键词 大数据 SPARK 数据倾斜 数据处理 Big data Spark Data skew Data processing
  • 相关文献

参考文献4

二级参考文献51

  • 1陈勇旭,陈梦杰,刘雪冰,宋杰.基于MapReduce的连接聚集查询算法研究[J].计算机研究与发展,2013,50(S1):306-311. 被引量:7
  • 2周家帅,王琦,高军.一种基于动态划分的MapReduce负载均衡方法[J].计算机研究与发展,2013,50(S1):369-377. 被引量:11
  • 3韩蕾,孙徐湛,吴志川,陈立军.MapReduce上基于抽样的数据划分最优化研究[J].计算机研究与发展,2013,50(S2):77-84. 被引量:13
  • 4Gufler B, Augsten N, Reiser A, et al. Handing data skew in mapRe- duce[ C]. Proceedings of the I st International Conference on Cloud Computing and Services Science ,2011,146:574-583.
  • 5Kwon Y C,Ren K,Balazinska M,et al. Managing skew in hadoop [ J]. IEEE Data Eng,Bull,2013,36( 1 ) :24-33.
  • 6Ibrahim S,Jin H,Lu L,et al. Handling partitioning skew in MapRe- duce using LEEN [ J ]. Peer-to-Peer Networking and Applications, 2013,6(4) :409-424.
  • 7Xu Y,Zou P, Qu W,et al. Sampling-based partitioning in MapRe- duce for skewed data [ C ]. ChinaGrid Annual Conference ( China- Grid) ,2012 Seventh, IEEE ,2012 : 1-8.
  • 8Yang H, Dasdan A, Hsiao R L, et al. Map-reduce-merge : simplified relational data processing on large clusters [ C ]. Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data, ACM,2007 : 1029 - 1040.
  • 9Abouzeid A, Bajda-Pawlikowski K, Abadi D, et al. HadoopDB : an architectural hybrid of MapReduce and DBMS technologies for ana- lytical workloads [ J ]. Proceedings of the VLDB Endowment,2009, 2( I ) :922-933.
  • 10Cbopra S, Rao M R. The partition problem [ J ]. Mathematical Pro- gramming, 1993,59( 1-3 ) :87-115.

共引文献39

同被引文献18

引证文献3

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部