期刊文献+

基于Hive的水利普查数据仓库 被引量:9

Water Census Data Warehouse Based on Hive
下载PDF
导出
摘要 针对水利普查数据海量、多维的特点,研究近年来在"大数据"概念下发展迅速的Hadoop与Hive,结合传统数据仓库在多维数据分析方面的成熟技术,提出基于Hive的水利普查数据仓库的构建方法,描述数据仓库系统的架构,并根据Hive的设计特点,通过分桶、消减维度表和冗余事实表的方法来改进传统的多维分析模型,最后搭建集群系统对水利普查数据集进行查询与分析测试。测试结果表明该数据仓库可以满足海量多维水利普查数据的存储与查询要求。 For the characters that water census data is of large volumes and high dimension, studying Hadoop and Hive which have a quick development recently in the "big data" concept and combining mature technology in multidimensional data analysis using traditional data warehouse, this article proposes a construction method of water census data warehouse based on Hive. This paper describes the architecture of data warehouse system, improves multidimensional model by dimension table reduction, fact table redundancy and Hive' s bucket method, then carries on queries and analysis to water census data set on Hadoop cluster system. Experimental results show that the data warehouse meets the f storage and query requirements of massive multidimensional water census data.
出处 《计算机与现代化》 2014年第5期127-130,共4页 Computer and Modernization
基金 国家自然科学基金资助项目(51079040) 水利部948项目(201016)
关键词 数据仓库 水利普查 模型优化 大规模数据处理 Hive data warehouse water census model optimization large data processing
  • 相关文献

参考文献10

二级参考文献79

  • 1崔杰,李陶深,兰红星.基于Hadoop的海量数据存储平台设计与开发[J].计算机研究与发展,2012,49(S1):12-18. 被引量:141
  • 2姚燕,李湘.基于Web的国外实时气象资料服务系统[J].计算机应用与软件,2007,24(2):86-87. 被引量:3
  • 3[OL].<http://hadoop.apache.org.>.
  • 4WinterCorp: 2005 TopTen Program Summary. http:// www. wintercorp, com/WhitePapers/WC TopTenWP. pdf.
  • 5TDWI Checklist Report: Big Data Analytics. http://tdwi. org/research/2010/08/Big-Data-Analytics, aspx.
  • 6Chaudhuri S, Dayal U. An overview of data warehousing and OLAP technology. SIGMOD Rec, 1997,26(1): 65-74.
  • 7Madden S, DeWitt D J, Stonebraker M. Database parallelism choices greatly impact scalability. DatabaseColumn Blog. http://www, databasecolumn, com/2007/10/database-parallelism-choices, html.
  • 8Dean J, Ghemawat S. MapReduce: Simplified data processing on large clusters//Proceedings of the 6th Symposium on Operating System Design and Implementation (OSDI ' 04). San Francisco, California, USA, 2004: 137-150.
  • 9DeWitt D J, Gerber R H, Graefe G, Heytens M L, Kumar K B, Muralikrishna M. GAMMA--A high performance dataflow database machine//Proceedings of the 12th International Conference on Very Large Data Bases (VLDB' 86). Kyoto, Japan, 1986:228-237.
  • 10Fushimi S, Kitsuregawa M, Tanaka H. An overview of the system software of a parallel relational database machine// Proceedings of the 12th International Conference on Very Large DataBases(VLDB'86). Kyoto, Japan, 1986:209-219.

共引文献658

同被引文献79

  • 1周丽娟,王慧,王文伯,张宁.面向海量数据的并行KMeans算法[J].华中科技大学学报(自然科学版),2012,40(S1):150-152. 被引量:32
  • 2龚立群,高琳.RDF查询语言的比较研究[J].计算机时代,2007(3):6-8. 被引量:2
  • 3Lam C.Hadoop实战[M].北京:人民邮电出版社,2011.
  • 4曹妍.本体理论在城市智能交通系统语义集成中的应用研究[D].大连:大连海事大学,2010.
  • 5Dean J, Ghemawat S. MapReduce: Simplified data pro- cessing on large clusters [ J ]. Communications of the ACM, 2008,51(1) :107-113.
  • 6Borthakur D. The Hadoop Distributed File System: Archi- tecture and Design [ DB/OL]. http://web, mit. edu/ mriap/hadoop/hadoop-O. 13. 1/docs/hdfs_ design, pdf, 2007-11-21.
  • 7Dean ], Ghemawat S. MapReduce: Simplified data pro- cessing on large clusters[ C ]//Proceedings of the 6th Sym- posium on Operating Systems Design and Implementation. 2004 : 137-149.
  • 8Google.谷歌文件系统(GFS)[DB/OL].http://researeh.googlecom/arehive/gfs.html.
  • 9Google.MapReduce[DB/OL].http://research.google.com/archive/mapre - duce.html.
  • 10GarryTurkington,张治起.Hadoop基础教程[M].北京:人民邮电出版社,2014.

引证文献9

二级引证文献35

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部