期刊文献+

SDD-1改进算法在Hive中应用 被引量:7

SDD-1 Improved Algorithm Used in Hive
下载PDF
导出
摘要 针对Hive在处理连接查询时所存在的执行时间长和带宽资源消耗大等问题,提出了一种基于数据预处理和双半连接的SDD-1改进算法.首先,引入预处理技术,在各分布节点对原始数据进行归并排序,以减少汇聚节点的数据映射次数,加快数据处理执行速度;其次,采用基于行和列的双半连接技术,进一步缩减在不同节点间的数据传输量,减少带宽资源消耗.仿真实验表明,相比原始的Hive连接算法,改进算法在元组数达到5 000和8 000时,可使查询速度提升10%,有效缩短查询的处理和响应时间,该改进算法可方便地应用到其他云计算平台上. To solove the existence of the long execution time and bandwidth resource consumption and other issues when dealing with queries in Hive system, this paper presented based on data preprocessing and double half connected SDD-1 improved algorithm. Firstly, the introduction of pre-processing technology, the distribution of nodes in each merge sort the raw data in order to reduce the number of data aggregation node mapping, speed up data processing speed of execution; Secondly, the use of semi-connection technology based on double rows and columns, and further reduction in different data transfer between nodes, reducing bandwidth consumption. The simulation results show that, compared to the original Hive join algorithm, the improved algorithm in the number of tuples to 5 000 and 8 000, can make the query speed increased by 10 %, shorten the processing and query response time, application of the improved algorithm can be convenient to other cloud computing platform.
出处 《湘潭大学自然科学学报》 CAS 北大核心 2014年第4期77-82,共6页 Natural Science Journal of Xiangtan University
基金 国家自然科学基金项目(61072002)
关键词 数据预处理 双半连接 SDD-1改进算法 data pre-processing double half connected SDD-1 improved algorithm
  • 相关文献

参考文献5

  • 1赵彦荣,王伟平,孟丹,张书彬,李均.基于Hadoop的高效连接查询处理算法CHMJ[J].软件学报,2012,23(8):2032-2041. 被引量:36
  • 2YANG H C, DASDAN A, HSIAO RL, et al. Map-Reduce-Merge:simplified relational data processing on large cluster[C]//Proc of the SIGMOD 2007. 2007:1 029-1 040.
  • 3LAMMEL R. Google' s MapReduce programming model revisited[J]. Science Computer Program, 2008,70 ( 1 ) : 1 - 30.
  • 4THUSOO A, SARMA J S, JAIN N, et al. Hive: A warehousing solution over a map-reduce framework[J]. Proc of the VLDB Endowment, 2009,2(2) :1 626-1 627.
  • 5SYAM M. Allocating fragments in distributed databasesFJ3. IEEE Transactions on Parallel and Distributed Systems,2005,16: 577 -585.

二级参考文献20

  • 1Ghemawat S, Gobioff H, Leung ST. The Google file system. In: Proc. of the SOSP 2003. 2003.20-43. [doi: 10.1145/1165389. 945450].
  • 2Dean J, Ghemawat S. MapReduce: Simplified data processing on large clusters. In: Proc. of the OSDI 2004. 2004. 137-150. [doi: 10.1145/1327452.1327492].
  • 3Yang HC, Dasdan A, Hsiao RL, Parker DS. Map-Reduce-Merge: Simplified relational data processing on large cluster. In: Proc. of the SIGMOD 2007. 2007. 1029-1040. [doi: 10.1145/1247480.1247602].
  • 4Lammel R. Google's MapReduce programming model Revisited. Science Computer Program, 2008,70(1):1-30. [doi: 10.1016/ j .scico .2007.07.001 ].
  • 5Thusoo A, Sarma JS, Jain N, Shao Z, Chakka P, Anthony S, Liu H, Wyckoff P, Murthy R. Hi:ce: A warehousing solution over a map-reduce framework. Proc. of the VLDB Endowment, 2009,2(2): 1626-1627.
  • 6Thusoo A, Sarma JS, Jain N, Shao Z, Chakka P, Zhang N, Antony S, Liu H, Murthy R. Hive--A petabyte scale data warehouse using Hadoop data engineering. In: Proc. of the ICDE. 2010. 996-1005. [doi: 10.1109/ICDE.2010.5447738].
  • 7Olston C, Reed B, Sirvastava U, Kumar R, Tomkins A. Pig Latin: A not-so-foreign language for data processing. In: Proc. of the SIGMOD. 2008. 1099-1110. [doi: 10.1145/1376616.1376726].
  • 8White T. Hadoop: The Definitive Guide. O'Reilly, 2009.
  • 9Apache Hadoop. http://hadoop.apache.org/.
  • 10Murty J. Programming Amazon Web Services: S3, EC2, SQS, FPS, and SimpleDB. O'Reilly, 2008.

共引文献35

同被引文献48

引证文献7

二级引证文献19

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部