摘要
Hbase有着先天的优势和先天的劣势,而劣势就是其较差的数据定位能力,也就是数据查询能力。因为面向列的特点,Hbase只能单单地以rowkey为主键作查询,而无法对表进行多维查询和join操作,并且查询通常都是全表扫描,耗费资源较大,查询效率较低。类比于传统型数据库里的一些查询方式,本文对Hbase的存储原理进行了研究,借助分布式计算框架Mapreduce在Hbase上构建了二级索引,就可以对表进行有针对性的定位和高效率的查找,同时也减轻zookeeper服务对资源调度的压力。
Hbase has the inborn advantage and disadvantage, and its disadvantage is its poor data positioning ability, namely data query ability. Due to column oriented features, Hbase can only use rowkey as its primary key for queries, meanwhile be unable to perform multidimensional queries and join operations on the table, and queries are usually designed in full table scans, which could consume more resources and cause lower query efficiency. Analogous to some queries in traditional databases, the paper studies storage principle of Hbase, and applies distributed computing framework Mapreduce to construct two-level index, therefore realizes pertinent positioning and efficient search, also relieves the pressure of zookeeper services on resource scheduling.
出处
《智能计算机与应用》
2017年第4期59-61,共3页
Intelligent Computer and Applications
基金
江苏省高等学校大学生创新创业训练计划一般项目(20161112216017)
江苏省现代教育技术研究课题(2016-R-46828)