期刊文献+

基于Hive的性能优化研究 被引量:7

Performance optimization research based on Hive
下载PDF
导出
摘要 主要从Map Reduce作业调度和Hive性能调优两个方面对Hive的性能优化进行研究.对于Map Reduce主要从编程模型切入,分析其执行过程,并从map端、reduce端进行参数调优.接着从Hive框架角度入手,分别从分区表和外部表以及常用数据文件的压缩、行式存储与列式存储等方面进行深入研究.实验结果表明,snappy压缩、orcfile/parquet存储格式对于列式查询,提高查询效率,对于大数据分析平台有较好的兼容性. This paper research Hive performance optimization mainly from the two aspects of MapReduce scheduling and Hive performance tuning. MapReduce programming model and its implementation process is analyzed,and parameters are tuned from the map side and reduce side. Then Hive framework is researched from the aspects of the partition table,the external surface and common data file compression, the line storage and column type storage. The experimental results show that snappy compression and orcfile/parquet storage format can improve the efficiency of query for the column type queries, and has good compatibility.
出处 《上海师范大学学报(自然科学版)》 2017年第4期527-534,共8页 Journal of Shanghai Normal University(Natural Sciences)
关键词 数据仓库 作业调优 性能优化 压缩 存储格式 data warehouse job optimization performance optimization compression storage format
  • 相关文献

参考文献2

二级参考文献58

  • 1宁焕生,张瑜,刘芳丽,刘文明,渠慎丰.中国物联网信息服务系统研究[J].电子学报,2006,34(B12):2514-2517. 被引量:151
  • 2J Dean,S Ghemawat.MapReduce:Simplified data processing on large clusters[J].Communications of the ACM,2008,51(1):107-113.
  • 3J L Wagener.High performance fortran[J].Computer Standards & Interfaces,Elsevier,1996,18(4):371-377.
  • 4W Gropp,E Lusk,et al.Using MPI:Portable Parallel Programming with the Message Passing Interface[M].Cambridge:MIT Press,1999.1-350.
  • 5A Geist,A Beguelin,et al.PVM:Parallel Virtual Machine:A Users' Guide and Tutorial for Networked Parallel Computing[M].Cambridge:MIT Press,1995.1-299.
  • 6A Verma,N Zea,et al.Breaking the mapreduce stage barrier .Proc of IEEE International Conference on Cluster Computing .Los Alamitos:IEEE Computer Society,2010.235-244.
  • 7H C Yang,A Dasdan,et al.Map-Reduce-Merge:Simplified relational data processing .Proc of ACM SIGMOD International Conference on Management of Data .New York:ACM,2007.1029-1040.
  • 8S V Valvag,D Johansen.Oivos:Simple and efficient distributed data processing .Proc of IEEE International Conference on High Performance Computing and Communications .Piscataway:IEEE,2008.113-122.
  • 9Z Vrba,P Halvorsen,et al.Kahn process networks are a flexible alternative to mapreduce .Proc of IEEE International Conference on High Performance Computing and Communications .Piscataway:IEEE,2009.154-162.
  • 10Apache hadoop .http://lucene.apache.org/hadoop/,2010-10-15/2010-12-28.

共引文献189

同被引文献53

引证文献7

二级引证文献22

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部