期刊文献+

基于MapReduce的混合连接算法

Hybrid Join Algorithm Based on MapReduce
下载PDF
导出
摘要 运行在Hadoop上的数据仓库Hive可以让更多的用户通过SQL接口来处理Hadoop数据。然而,Hive却没有为连接操作提供有效的途径,而连接操作是一种常见且在Hadoop中非常费时的操作。为了解决连接操作在Hadoop中性能的问题,本文提出一种混合策略的连接算法HJ-A,根据当前应用场景在几种连接算法之间选择相对较合适的算法,实验结果表明,HJ-A可以在大多数的Hadoop场景中发挥很好的性能。 Hive, the database on Hadoop, enables more users to process relation data by providing sql-like interface. However, Hive does not provide an efficient approach for join, a common but expensive operator in Hadoop. In order to solve the perform-ance of join, this paper proposes a novel hybrid algorithm, HJ-A, which can help to automatically choose the relatively better one among several methods, according to the current situation. Experiments results show that HJ-A can get best performance in most situations.
作者 胡龙 罗军
出处 《计算机与现代化》 2015年第6期86-91,共6页 Computer and Modernization
关键词 MAPREDUCE HADOOP 分区连接 auto-tunning Hive MapReduce Hadoop partition join auto-tuning Hive
  • 相关文献

参考文献18

  • 1Jeffrey Dean, Sanjay Ghemawat. MapReduce: Simplied data processing on large clusters[C]// Operating Systems Design and Implementation, San Francisco, 2004. 2004:137-150.
  • 2The Apache Software Foundation. Hadoop[EB/OL]. http://hadoop.apache.org, 2014-12-20.
  • 3Ma Lili, Liao Huaming, He Yongqiang, et al. A switch criteria for hybrid datasets merging on Top of MapReduce[C]// Proceedings of the 8th International Conference on Grid and Cooperative Computing, 2009. 2009:293-298.
  • 4The Apache Software Foundation. Hive[EB/OL]. http://hive.apache.org/, 2014-12-20.
  • 5Olston C, Reed B, Srivastava U, et al. Pig latin:A NotSoForeign-language for data processing[C]// SIGMOD,2008. 2008:1099-1110.
  • 6Zaharia M, Konwinski A, Joseph A D, et al. Improving MapReduce performance in heterogeneous environments[C]// SIGMOD, 2012. 2012:29-42.
  • 7Pavlo A, Paulson E, Rasin A, et al. A comparsion of approaches to largescale data analysis[C]// SIGMOD, 2009. 2009:165-178.
  • 8Taniar D, Leung C H C, Rahayu W, et al. HighPerformance Parallel Database Processing and Grid Databases[M]. John Wiley & Sons, Inc., 2008.
  • 9Yang H, Dasdan A, Hsiao R L,et al. Map-Reduce-Merge: Simplified relational data processing on large clusters[C]// SIGMOD,2007. 2007:1029-1040.
  • 10Vernica R, Carey M J, Li C. Efficient parallel set-similarity joins using MapReduce[C]// New SIGMOD, 2010. 2010:495-506.

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部