期刊文献+

基于MapReduce的连接聚集查询算法研究 被引量:7

MapReduce Based Aggregate-Join Query Algorithms
下载PDF
导出
摘要 数据的指数级增长给数据管理和分析带来了严峻的挑战.连接聚集查询是数据分析中一种常用运算,而MapReduce是一种用于大规模数据集并行处理的编程模型,研究基于MapReduce的连接聚集查询算法有着学术意义和应用价值.首先在归纳和扩展现有连接算法的基础上总结出4种基于MapReduce的连接聚集查询算法;接着根据应用场景的不同又提出另外两种实现算法;同时提出I/O代价是决定基于MapReduce的连接聚集查询算法性能的主要因素;最后通过大量实验分析这6种算法在不同查询应用下的优劣,总结了它们各自的适用场景,并分析了各个算法的性能与数据特征之间的关系. The exponential growth of data has brought serious challenges to the data management and analysis.Aggregate-join query is a common data analysis operation,and MapReduce is a programming model for implementing parallel processing on large-scale datasets.Therefore the research on MapReduce-based aggregate-join query algorithms has some academic significance and application value.Through the induction and expansion of the existing join algorithms,four kinds of MapReducebased aggregate-join algorithms are proposed.And on the basis of different application scenarios, another two implementation algorithms are proposed.The opinion that the cost of reads/writes are key factors in determining the performance of the algorithms is also put forward.Experimental results show the pros and cons of six algorithms under different query applications,application scenarios of them are concluded,and relations between performance and data characteristics are analyzed.
出处 《计算机研究与发展》 EI CSCD 北大核心 2013年第S1期306-311,共6页 Journal of Computer Research and Development
基金 国家自然科学基金项目(61202088) 辽宁省自然科学基金项目(200102059) 中央高校基本科研业务费专项资金项目(N120817001)
关键词 海量数据 连接聚集查询 MAPREDUCE I/O代价 算法优化 massive data aggregate-join query MapReduce I/O cost algorithm optimization
  • 相关文献

同被引文献46

  • 1周家帅,王琦,高军.一种基于动态划分的MapReduce负载均衡方法[J].计算机研究与发展,2013,50(S1):369-377. 被引量:11
  • 2郑骁庆,陈华钧,吴朝晖,毛郁欣.Dynamic Query Optimization Approach for Semantic Database Grid[J].Journal of Computer Science & Technology,2006,21(4):597-608. 被引量:2
  • 3赵春宇,孟令奎,林志勇.一种面向并行空间数据库的数据划分算法研究[J].武汉大学学报(信息科学版),2006,31(11):962-965. 被引量:26
  • 4王永杰,孟令奎,赵春宇.基于Hilbert空间排列码的海量空间数据划分算法研究[J].武汉大学学报(信息科学版),2007,32(7):650-653. 被引量:18
  • 5Jeffrey Dean,Sanjay Ghemawat.MapReduce:simplified data processing on large clusters[J].Communications of the ACM,2008,51(1):107-113.
  • 6G,Dan.Development of Massive Astronomy Data Federation System and Research of Data Mining Algorithms-Tool Devel- opment and Algorithm Research[J].Publications of the Astro- nomical Society of the Pacific,2008,120(874);1357.
  • 7Y.-W.Huang,N.Jing,and E.A.Rundensteiner,"Spatial Joins Using R-trees:Breadth-First Traversal with Global Op- timizations[Z].in Proceedings of the 23rd International Con- ference on Very Large Data Bases,San Francisco,CA,USA,1997.396-405.
  • 8P.Mishra and M.H.Eich.Join processing in relational databases[J].ACM Coinput.Surv.,1992,24(1):63-113.
  • 9Jens Dittrich,Jorge-Arnulfo Quiané-Ruiz,Alekh Jindal,Yagiz Kargin,Vinay Setty,J?rg Schad.Hadoop++: making a yellow elephant run like a cheetah (without it even noticing). Proceedings of the VLDB Endowment . 2010
  • 10Blanas S,Patel J M,Ercegovac V,et al.A comparison of join algorithms for log processing in MapReduce. Proc of the ACM SIGMOD Int Conf on Management of Data . 2010

引证文献7

二级引证文献16

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部