期刊文献+

非均匀数据分布下的MapReduce连接查询算法优化 被引量:1

Join Query Optimization Based on MapReduce under Skewed Data
下载PDF
导出
摘要 MapReduce分布式计算框架有助于提升大规模数据连接查询的效率,但当连接属性分布不均匀时,其简单的散列策略容易导致计算节点间负载不均衡,影响作业的整体性能。针对连接查询操作中的数据倾斜问题,研究了MapReduce框架下大规模数据连接查询操作的优化算法。首先对经典的改进重分区连接查询算法进行实验分析,研究了传统MapReduce计算框架下连接查询操作的执行流程,找出了基于MapReduce计算框架的连接查询算法在数据分布不均匀时的性能瓶颈;进而提出了组合分割平衡分区优化策略,设计并实现了基于组合分割平衡分区优化策略的改进型连接查询算法。实验结果表明,提出的优化策略在大规模数据的连接查询处理上很好地解决了数据倾斜带来的性能影响,具有好的时间性能和可扩展性。 MapReduce,a classic distributed computing environment,can improve the performance of join query on large-scale data,but when the join attributes do not follow a uniform distribution,the pure hash strategy in traditional MapReduce will lead to load imbalance over computing nodes,which will reduce the performance of overall task.Aiming at the data skew problem in the join query,this paper studies the join query optimization based on MapReduce computing framework.Firstly,this paper conducts experimental analysis for the improved repartitioning join query algorithm,studies the execution phases of join query based on traditional MapReduce computing framework,and finds the performance bottlenecks of join query on MapReduce computing framework when data do not follow a uniform distribution.Based on the above,this paper designs and implements an improved join query optimization algorithm,which is based on an execution strategy by integrating the combination segmentation method and equilibrium partitioning method.The experimental results show that the proposed optimization method provides a good solution for distributed join query on large-scale skewed datasets,and presents an excellent time performance and scalability.
作者 张敬伟 尚宏佳 钱俊彦 周萍 杨青 ZHANG Jingwei;SHANG Hongjia;QIAN Junyan;ZHOU Ping;YANG Qing(Guangxi Key Laboratory of Trusted Software, Guilin University of Electronic Technology, Guilin, Guangxi 541004, China;Guangxi Cooperative Innovation Center of Cloud Computing and Big Data, Guilin University of Electronic Technology,Guilin, Guangxi 541004, China;Guangxi Key Laboratory of Automatic Measurement Technology and Instrument, Guilin University of Electronic Technology, Guilin, Guangxi 541004, China)
出处 《计算机科学与探索》 CSCD 北大核心 2017年第5期752-767,共16页 Journal of Frontiers of Computer Science and Technology
基金 国家自然科学基金Nos.U1501252 61363005 61462017 广西自然科学基金Nos.2014GXNSFAA118353 2014GXNSFAA118390 2014GXNSFDA118036 广西高等学校高水平创新团队及卓越学者计划 广西云计算与大数据协同创新中心基金项目 广西物联网技术与产业化推进协同创新中心资助项目~~
关键词 连接查询 MAPREDUCE 数据倾斜 join query MapReduce skewed data
  • 相关文献

参考文献1

二级参考文献20

  • 1Ghemawat S, Gobioff H, Leung ST. The Google file system. In: Proc. of the SOSP 2003. 2003.20-43. [doi: 10.1145/1165389. 945450].
  • 2Dean J, Ghemawat S. MapReduce: Simplified data processing on large clusters. In: Proc. of the OSDI 2004. 2004. 137-150. [doi: 10.1145/1327452.1327492].
  • 3Yang HC, Dasdan A, Hsiao RL, Parker DS. Map-Reduce-Merge: Simplified relational data processing on large cluster. In: Proc. of the SIGMOD 2007. 2007. 1029-1040. [doi: 10.1145/1247480.1247602].
  • 4Lammel R. Google's MapReduce programming model Revisited. Science Computer Program, 2008,70(1):1-30. [doi: 10.1016/ j .scico .2007.07.001 ].
  • 5Thusoo A, Sarma JS, Jain N, Shao Z, Chakka P, Anthony S, Liu H, Wyckoff P, Murthy R. Hi:ce: A warehousing solution over a map-reduce framework. Proc. of the VLDB Endowment, 2009,2(2): 1626-1627.
  • 6Thusoo A, Sarma JS, Jain N, Shao Z, Chakka P, Zhang N, Antony S, Liu H, Murthy R. Hive--A petabyte scale data warehouse using Hadoop data engineering. In: Proc. of the ICDE. 2010. 996-1005. [doi: 10.1109/ICDE.2010.5447738].
  • 7Olston C, Reed B, Sirvastava U, Kumar R, Tomkins A. Pig Latin: A not-so-foreign language for data processing. In: Proc. of the SIGMOD. 2008. 1099-1110. [doi: 10.1145/1376616.1376726].
  • 8White T. Hadoop: The Definitive Guide. O'Reilly, 2009.
  • 9Apache Hadoop. http://hadoop.apache.org/.
  • 10Murty J. Programming Amazon Web Services: S3, EC2, SQS, FPS, and SimpleDB. O'Reilly, 2008.

共引文献35

同被引文献4

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部