期刊文献+

基于分布式数据仓库的分类分析研究 被引量:10

Research of classification analysis for distributed data warehouse
下载PDF
导出
摘要 针对GAC-RDB分类算法只能应用于单机版数据仓库的局限性,为了能够更方便、快捷地在云计算平台上开展数据挖掘工作,基于分布式数据仓库HBase,结合GAC-RDB分类算法的实现机理,制定适合分布式平台的运行策略,使用原生HiveQL语言提出了一种分布式GAC-RDB分类算法。实验显示,随着集群中节点的不断增加,算法的运行时间稳步下降。结果表明,在保证算法准确率的前提下,分布式数据仓库能够有效提高GACRDB分类算法的扩展性和运行效率,相对于MapReduce框架,HiveQL语言降低了对数据挖掘从业人员的技术要求,更大程度地减少了算法的开发时间,为挖掘海量数据提供了新的解决方案。 According to the limit of GAC-RDB classification algorithm which was designed for stand-alone data warehouse, in order to carry out data mining works more convenient and efficient on cloud computing platform, based on HBase, a distributed data warehouse, and the implementation mechanism of GAC-RDB classification algorithm, this paper proposed a distributed strategy, put forward the distributed GAC-RDB classification algorithm by native HiveQL language. Experiments show that the algorithm running time steadily decline as increased the number of nodes in the cluster. Results indicate that the efficiency of GAC-RDB algorithm can be improved when it is working on a distributed data warehouse, with extended scalability. Relative to the MapReduce framework, HiveQL cut down the technical requirements for data mining workers, decrease development time of the algorithm.
出处 《计算机应用研究》 CSCD 北大核心 2013年第10期2936-2939,2943,共5页 Application Research of Computers
基金 国家自然科学基金资助项目(60873196) 中央高校基本科研业务费专项资金资助项目(QN2009092)
关键词 数据挖掘 分布式数据仓库 分类分析 GAC—RDB HADOOP HBASE Hive data mining distributed data warehouse classification analysis GAC-RDB Hadoop HBase Hive
  • 相关文献

参考文献14

  • 1花海洋,李一凡,赵怀慈.基于分布式数据仓库技术的ETL系统的研究与应用[J].微计算机信息,2006,22(10X):144-146. 被引量:7
  • 2LU Hong-jun, LIU Hong-yan. Decision tables: scalable classification exploring RDBMS capabilities [ C ]//Proc of the 26th International Conference on Very Large Data Bases. New York: IEEE Press,2000: 373-384.
  • 3刘红岩,陆宏钧,陈剑.利用数据库技术实现的可扩展的分类算法[J].软件学报,2002,13(6):1075-1081. 被引量:14
  • 4Apache. Apache Hive[ EB/OL]. (2013-02-08) [2013-01-02]. ht-tp://hive. apache, org.
  • 5李伟卫,赵航,张阳,等.基于MapReduce的海量数据挖掘技术研究[EB/OL]. .2012-06-01 ) [2013-01-02]. http://www.cnki,net/kcms /detail/ll. 2127. TP. 20120601. 1457.016. html.
  • 6CHANG F,DEAN J, GHEMAWAT S, et al. Bigtable: a distributedstorage system for structured data [ J]. ACM Trans on ComputerSystems,2008,26(2):1-14.
  • 7TAYLOR R C. An overview of the Hadoop/MapReduce/HBase frame-work and its current applications in bioinformatics [ J]. BMC Bioin-formatics,20l0,11 (12) : SI.
  • 8李超,张明博,邢春晓,胡劲松.列存储数据库关键技术综述[J].计算机科学,2010,37(12):1-7. 被引量:24
  • 9GEORGE L. HBase : the definitive guide[ M]. Cambridge : O'ReillyMedia,2011:27.
  • 10刘永增,张晓景,李先毅.基于Hadoop/Hive的web日志分析系统的设计[J].广西大学学报(自然科学版),2011,36(A01):314-317. 被引量:24

二级参考文献30

  • 1席景科,闫大顺.Web数据挖掘中数据集成问题的研究[J].计算机工程与设计,2006,27(8):1366-1368. 被引量:6
  • 2Cannataro M, Talia D, Trunfio P. KNOWLEDGE GRID.. High Performance Knowledge Discovery on the Grid [C] // Lecture Notes In Computer Science, Vol. 2242, Proceedings of the Second International Workshop on Grid Computing. 2001:38-50.
  • 3Ye Yan-bin, Chiang C-C. A Parallel Apriori Algorithm for Frequent Item sets Mining[C]//Proeeedings of the Fourth International Conference on Software Engineering Research Manage- ment and Applications(SERA'06). 2006:87-94.
  • 4Armbrust M, Fox A, Griffith R, et al. Above the Clouds: A Berkeley View of Cloud Computing.
  • 5王鹏.云计算的关键技术与应用实例.
  • 6HUSSAIN T, ASGHAR S, MASOOD N. Web Usage Mining:A Survey on Preprocessing of Web Log File[ C]//Information and Emerging Technologies, 2010 : 1-6.
  • 7ASHISH T, JOYDEEP S, NAMIT J, et al. Hive-A Petabyte Scale Data Warehouse Using Hadoop[ C ],//Data Engineering (ICDE), 2010 IEEE 26th International :996-1005.
  • 8Tom White.Hadoop权威指南[M].曾大聃,周傲英,译.北京:清华大学出版社,2010.
  • 9HE YONGQIANG, LEE RUBAO, HUAI YIN, et al. RCFile:A Fast and Space-efficient Data Placement Structure in MapReduce-bsed Warehouse Systems[ C ]//Data Engineering (ICDE), 2010 IEEE 26th International :996-1005.
  • 10王珊等编著.数据仓库技术与联机分析处理[M].北京:科学出版社,1999

共引文献115

同被引文献136

引证文献10

二级引证文献83

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部