期刊文献+

PipelineJoin:一种新的基于MapReduce的多表连接算法 被引量:3

PipelineJoin:A new MapReduce-based multi-table join algorithm
下载PDF
导出
摘要 MapReduce是一个并行分布式计算模型,已经被广泛应用于处理两个或多个大型表的连接操作.现有的基于MapReduce的多表连接算法,在处理链式连接时,不能处理多个大表的连接,或者需要顺序运行较多的MapReduce任务,效率较低.为此提出了一种基于MapReduce的多表连接算法——PipelineJoin,高效地实现任意多个大表的链式连接.PipelineJoin采用流水线模型和调度器来实现MapReduce任务的流水线式执行,从而有效提高多表连接的效率,同时可以较好地克服链式多表连接算法的缺陷.最后,在不同规模的数据集上进行了大量实验,实验结果表明PipelineJoin算法与原有链式多表连接算法相比,可以有效减少连接所需的时间. MapReduce,aparallel and distributed computing model,has been widely used to process join operations for two or more large tables.The existing MapReduce-based multi-table join algorithms all have some limitations when dealing with chain join.Some methods can not process join operations for multi large tables,and others involve sequentially running too many MapReduce tasks,which leads to low efficiency.Here a new MapReduce-based multi-table join algorithm,PipelineJoin,is proposed to process chain join of a number of tables.PipelineJoin adopts a pipeline model and a scheduler to allow the overlapping execution of a series of Map tasks and Reduce tasks in the whole join process so as to enhance the efficiency of multi-table join,while effectively overcoming the deficiency of the existing methods.Extensive experimental results based on various synthetic datasets show that the proposed algorithm can greatly reduce join operation time compared with the existing chain join algorithms.
出处 《中国科学技术大学学报》 CAS CSCD 北大核心 2015年第10期836-845,共10页 JUSTC
基金 国家自然科学基金(61303004 1202012) 国家科技支撑计划(863)(2015BAH16F00/F01/F02)资助
关键词 连接 多表 MAPREDUCE PipelineJoin join multi-table MapReduce PipelineJoin
  • 相关文献

参考文献11

  • 1Kenn Slagter,Ching-Hsien Hsu,Yeh-Ching Chung,Gangman Yi.SmartJoin: a network-aware multiway join for MapReduce[J]. Cluster Computing . 2014 (3)
  • 2David Jiang,Anthony K. H. Tung,Gang Chen.MAP-JOIN-REDUCE: Toward Scalable and Efficient Data Analysis on Large Clusters. IEEE Transactions on Knowledge and Data Engineering . 2011
  • 3Afrati, Foto N.,Ullman, Jeffrey D.Optimizing multiway joins in a map-reduce environment. IEEE Transactions on Knowledge and Data Engineering . 2011
  • 4Spyros Blanas,Jignesh M. Patel,Vuk Ercegovac,Jun Rao,Eugene J. Shekita,Yuanyuan Tian.A comparison of join algorithms for log processing in Map/Reduce. Proceedings of the ACM SIGMOD International Conference . 2010
  • 5Yang H,Dasdan A,Hsiao R L, et al.Map-reduce-merge:simplified relational data processing on large clusters. Proceedings of the 2007 ACM SIGMOD international conference on Management of data . 2007
  • 6Hunt P,Konar M.ZooKeeper:Wait-free Coordination for Internet-Scale Systems. USENIX Annual Technical Conference . 2010
  • 7Eltabakh M,Tian Yuanyuan,zcan F,et al.CoHadoop:flexible data placement and its exploitation in Hadoop. Proceedings of the 37th International Conference on Very Large Data Bases (VLDB 11) . 2011
  • 8Jens Dittrich,Jorge-Arnulfo Quiane-Ruiz,Alekh Jindal,et al.Ha-doop++:making a yellow elephant run like a cheetah(without iteven noticing). The 36 th International Conference on VeryLarge Data Bases,VLDB 2010/PVLDB . 2010
  • 9Blanas S,Li Y,Patel J M.Design and evaluation of main memory hash joinalgorithms for multi-core CPUs. Proceedings of the ACM SIGMOD InternationalConference on Management of Data . 2011
  • 10Yan K,Zhu H.Two MRJs for multi-way theta-join in MapReduce. Proceedings of the 6th International Conference on Internet and Distributed Computing Systems . 2013

共引文献1

同被引文献12

引证文献3

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部