期刊文献+

基于Hbase的大数据查询优化 被引量:5

Query optimization of large data based on Hbase
下载PDF
导出
摘要 Hbase有着先天的优势和先天的劣势,而劣势就是其较差的数据定位能力,也就是数据查询能力。因为面向列的特点,Hbase只能单单地以rowkey为主键作查询,而无法对表进行多维查询和join操作,并且查询通常都是全表扫描,耗费资源较大,查询效率较低。类比于传统型数据库里的一些查询方式,本文对Hbase的存储原理进行了研究,借助分布式计算框架Mapreduce在Hbase上构建了二级索引,就可以对表进行有针对性的定位和高效率的查找,同时也减轻zookeeper服务对资源调度的压力。 Hbase has the inborn advantage and disadvantage, and its disadvantage is its poor data positioning ability, namely data query ability. Due to column oriented features, Hbase can only use rowkey as its primary key for queries, meanwhile be unable to perform multidimensional queries and join operations on the table, and queries are usually designed in full table scans, which could consume more resources and cause lower query efficiency. Analogous to some queries in traditional databases, the paper studies storage principle of Hbase, and applies distributed computing framework Mapreduce to construct two-level index, therefore realizes pertinent positioning and efficient search, also relieves the pressure of zookeeper services on resource scheduling.
作者 朱明 王志瑞
出处 《智能计算机与应用》 2017年第4期59-61,共3页 Intelligent Computer and Applications
基金 江苏省高等学校大学生创新创业训练计划一般项目(20161112216017) 江苏省现代教育技术研究课题(2016-R-46828)
关键词 HBASE 大数据处理 SECONDARY INDEXING Hbase big data process Secondary Indexing
  • 相关文献

参考文献2

二级参考文献14

  • 1刘远超,王晓龙,刘秉权.一种改进的k-means文档聚类初值选择算法[J].高技术通讯,2006,16(1):11-15. 被引量:23
  • 2李彦,刘道群.一种实用的动态负载平衡方法及实现[J].重庆工学院学报,2006,20(2):102-105. 被引量:1
  • 3李冬梅,施海虎.负载平衡调度问题的一般模型研究[J].计算机工程与应用,2007,43(8):121-125. 被引量:15
  • 4Han Jiawei,Kamber M.Data mining:concepts and tech- niques[M].San Francisco:Morgan Kaufmann Publishers, 2000.
  • 5Januzaj E, Kriegel H P, Pfeifle M.DBDC : Density-Based Distributed Clustering[C]//Proceedings of 9th International Conference on Extending Database Technology(EDBT). Oakland: IEEE Computer Press, 2004 : 88-105.
  • 6Samatova N F, Ostrouchov G.RACHET : an efficient cov- er-based merging of clustering hierarchies from distribut- ed datasets[J].Distributed and Parallel Databases,2002, 11 (2) : 157-180.
  • 7Johoson E, KarguPta H.Collective, hierarchical clustering from distributed, heterogeneous data[C]//Lecture Notes in Computer Science.Berlin: Springer, 2000 : 221-244.
  • 8Kargupta H.Sclable, distributed data mining using an agent based architecture[C]//Proceedings of 3rd Interna- tional Conference on Knowledge Discovery and Data Mining.Oakland .. AAAI Press, 1997 .. 211-214.
  • 9Hearst M A.Texttiling: segmenting text into multi-para- graph subtopic passages[J].Computational Linguistics, 1997,23(1) :33-64.
  • 10Dean J, Ghemawat S.MapReduce-simplified data process- ing on large clusters[C]//Proceedings of the 6th Inter- national Conference on Operation Systems Design & Im- plementation(OSDI), Berkeley, CA, USA, 2004 : 137-150.

共引文献64

同被引文献33

引证文献5

二级引证文献17

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部