期刊文献+

面向Hive查询的存储优化技术 被引量:1

Storage optimization technology for Hive query
下载PDF
导出
摘要 为了提高Hive的查询性能,对HDFS数据块放置策略进行了优化,提出了一种基于相关关系分析的数据块放置策略,通过构建并发关系矩阵和相交关系矩阵评估待放置数据块与节点上已放置数据块之间的相关关系,并综合目标数据块的访问频率,选择合适的节点存储。 In order to improve the query performance of Hive,the HDFS data block placement strategy was optimized,and a data block placement strategy based on correlation analysis was proposed.The correlation between the data block to be placed and the data block placed on the node was evaluated by constructing the concurrency relationship matrix and intersection relationship matrix,and the access frequency of the target data block was integrated to select the appropriate node storage.
作者 荆忠航 张伟 王佳慧 马利民 徐涛 JING Zhonghang;ZHANG Wei;WANG Jiahui;MA Limin;XU Tao(Computer School,Beijing Information Science&Technology University,Beijing 100101,China;Beijing Advanced Innovation Center for Materials Genome Engineering,Beijing Information Science&Technology University,Beijing 100101,China;Information and Network Security Department,National Information Center,Beijing 100045,China;Research Center for Microprocessor and System-on-Chip Technology,Tsinghua University,Beijing 100084,China)
出处 《北京信息科技大学学报(自然科学版)》 2021年第6期93-100,共8页 Journal of Beijing Information Science and Technology University
基金 北京材料基因工程高精尖创新中心项目。
关键词 Hive MAPREDUCE 存储优化 查询性能优化 Hive MapReduce storage optimization query performance optimization
  • 相关文献

参考文献2

二级参考文献21

  • 1Deelman E,Chervenak A.Data management challenges of data-intensive scientific workflows//Proceedings of the IEEE International Symposium on Cluster Computing and the Grid(CCGRID).Lyon,France,2008:687-692.
  • 2Deelman E,Blythe J,Gil Y,Kesselman C,Mehta G,Patil S,Su M H,Vahi K,Livny M.Pegasus:Mapping scientific workflows onto the grid//Proceedings of the European Across Grids Conference(AxGrids).Nicosia,Cyprus,2004:11-20.
  • 3Ludascher B,Altintas I,Berkley C,Higgins D,Jaeger E,Jones M,Lee E A.Scientific workflow management and the Kepler system.Concurrency and Computation:Practice and Experience,2005,18(10):1039-1065.
  • 4Oinn T,Addis M,Ferris J,Marvin D,Senger M,Greenwood M,Carver T,Glover K,Pocock M R,Wipat A,Li P.Taverna:A tool for the composition and enactment of bioinformatics workflows.Bioinformatics,2004,20(17):3045-3054.
  • 5Ghemawat S,Gobioff H,Leung S T.The google file system.ACM SIGOPS Operating Systems Review,2003,37(5):29-43.
  • 6Wang L,Tao J,Kunze M,Castellanos A C,Kramer D,Karl W.Scientific cloud computing:Early definition and experience//Proceedings of the 10th IEEE International Conference on High Performance Computing and Communications(HPCC).Dalian,China,2008:825-830.
  • 7Wieczorek M,Prodan R,Fahringer T.Scheduling of scientific workflows in the ASKALON grid environment.SIGMOD Record,2005,34(3):56-62.
  • 8Baru C,Moore R,Rajasekar A,Wan M.The SDSC storage resource broker//Proceedings of the IBMCentre for Advanced Studies Conference.Toronto,Canada,1998:1-12.
  • 9Churches D,Gombas G,Harrison A,Maassen J,Robinson C,Shields M,Taylor I,Wang I.Programming scientific and distributed workflow with Triana services.Concurrency and Computation:Practice and Experience,2006,18:1021-1037.
  • 10Chervenak A,Deelman E,Foster I,Guy L,Hoschek W,Iamnitchi A,Kesselman C,Kunszt P,Ripeanu M,Schwartzkopf B,Stockinger H,Stockinger K,Tierney B.Giggle:A framework for constructing scalable replica location services//Proceedings of the ACM/IEEE Conference on Supercomputing.Baltimore,Maryland,USA,2002:1-17.

共引文献135

同被引文献4

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部