期刊文献+

一种基于Hadoop的多表链接策略 被引量:2

A multi-table connection strategy based on Hadoop
下载PDF
导出
摘要 Hadoop系统在处理多表链接问题时,每轮都会将大量的中间结果写入本地磁盘,从而严重降低了系统的处理效率。为解决该问题,提出一种"替换-查询"方法,该方法通过对链接表建立索引,将预输出的元组集替换为索引信息输出到中间结果,以索引的形式参与多表链接,以此减少中间结果的I/O代价。运用缓冲池、二次排序和多线程技术对索引信息进行优化管理,加快索引查询速度。最后在TPC-H数据集上,设计了与原Hadoop的对比实验,结果表明该方法可减少35.5%的存储空间,提高12.9%的运行效率。 When Hadoop is used to deal with the issue of multi?table connection,a large number of intermediate resultsare written into local disks. As a result,efficiency of the system becomes very low. In order to solve this problem,a “Replace-Query” method is proposed. By building indexes for the connected tables,the pre-output tuple set are replaced as index informa-tion to send to the intermediate results. The I/O cost of the intermediate results becomes quite low. In order to improve systemperformance,it makes full use of buffer pool,secondary sort and multi-thread technique to optimize the management of indexes.These indexes participate in the whole multi-table connecting process and the records can be fully and rapidly recovered by que-rying. An experiment for contrasting it with the original Hadoop was designed on TPC-H data set. The results show that this methodprovides a 35.5% reduction in space consumption,and improves the running efficiency of 12.9%.
出处 《现代电子技术》 2014年第6期90-94,共5页 Modern Electronics Technique
基金 国家自然科学基金重点项目(61033007) 国家"973"重点基础发展规划基金资助项目(2012CB316203)
关键词 多表链接 替换-查询 索引 缓冲池 二次排序 multi-table connection replace-query index buffer pool secondary sorting
  • 相关文献

参考文献8

  • 1AFRATI F,ULLMAN J.Optimizing joins in a map-reduce environment[C]//Proceedings of 2010 EDBT.New York:ACM,2010:99-110.
  • 2JIANG Da-wei,TUNG A,CHEN Gang.Map-join-reduce:towards scalable and efficient data analysis on large clusters[J].IEEE Transactions on Knowledge and Data Engineering,2010,23(9):1299-1311.
  • 3LIN Yu-ting,AGRAWAL D,CHEN Chun,et a1.Llama:leveraging columnar storage for scalable join[C]//Proceedings of2011 ACM SIGMOD International Conference on Management of Data.New York:ACM,2011:861-972.
  • 4赵保学,李战怀,陈群,潘巍,姜涛,金健.基于共享的MapReduce多查询优化技术[J].计算机应用研究,2013,30(5):1405-1409. 被引量:7
  • 5赵保学,李战怀,陈群,等.可扩展Hadoop任务分配模块的研究与实现[C]//第29届中国数据库学术会议论文集(B辑)(NDBC2012).合肥:知识与数据工程实验室,2012:83-85.
  • 6林大云.基于Hadoop的微博信息挖掘[J].计算机光盘软件与应用,2012,15(1):7-8. 被引量:9
  • 7DEAN J,GHEMAWAT S.MapReduce:simplified data processing on large cluster[J].Communications of the ACM,2008,51(1):107-113.
  • 8王珊,王会举,覃雄派,周烜.架构大数据:挑战、现状与展望[J].计算机学报,2011,34(10):1741-1752. 被引量:616

二级参考文献64

  • 1[OL].<http://hadoop.apache.org.>.
  • 2WinterCorp: 2005 TopTen Program Summary. http:// www. wintercorp, com/WhitePapers/WC TopTenWP. pdf.
  • 3TDWI Checklist Report: Big Data Analytics. http://tdwi. org/research/2010/08/Big-Data-Analytics, aspx.
  • 4Chaudhuri S, Dayal U. An overview of data warehousing and OLAP technology. SIGMOD Rec, 1997,26(1): 65-74.
  • 5Madden S, DeWitt D J, Stonebraker M. Database parallelism choices greatly impact scalability. DatabaseColumn Blog. http://www, databasecolumn, com/2007/10/database-parallelism-choices, html.
  • 6Dean J, Ghemawat S. MapReduce: Simplified data processing on large clusters//Proceedings of the 6th Symposium on Operating System Design and Implementation (OSDI ' 04). San Francisco, California, USA, 2004: 137-150.
  • 7DeWitt D J, Gerber R H, Graefe G, Heytens M L, Kumar K B, Muralikrishna M. GAMMA--A high performance dataflow database machine//Proceedings of the 12th International Conference on Very Large Data Bases (VLDB' 86). Kyoto, Japan, 1986:228-237.
  • 8Fushimi S, Kitsuregawa M, Tanaka H. An overview of the system software of a parallel relational database machine// Proceedings of the 12th International Conference on Very Large DataBases(VLDB'86). Kyoto, Japan, 1986:209-219.
  • 9Brewer E A. Towards robust distributed systems//Proceedings of the 19th Annual ACM Symposium on Principles of Distributed Computing (PODC' 00). Portland, Oregon, USA, 2000:7.
  • 10http: //www. dbms2, com/2008/08/26/known-applications of mapreduce/.

共引文献628

同被引文献19

引证文献2

二级引证文献17

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部