期刊文献+

基于Elasticsearch的HBase海量数据二级索引方案 被引量:2

HBase Massive Data Secondary Indexing Scheme Based on Elasticsearch
下载PDF
导出
摘要 针对HBase不提供二级索引、自带Coprocessor(协作器)不稳定及海量数据检索速度较慢等问题,设计了一种新的基于Elasticsearch的HBase二级索引方案ELHBase(Elasticsearch Indexing HBase)。该方案借助Flume、Kafka、HBase及Elastic⁃search搭建了一套数据采集、高速解析和录入大数据处理框架,使用Flume自定义Sink采集数据同时生成相应ID存入到Kafka,通过解析技术分别把数据存储到HBase,相应ID作为索引存储到ElasticSearch。该方案在不利用Coprocessor的基础上增加了直接查询ElasticSearch的接口,利用ElasticSearch提供的高效、灵活、多样的检索功能实现对HBase海量数据的快速检索,协同解决了HBase数据索引性能不高、协作器不稳定、ElasticSearch不适合大量数据存储等问题。最后,分别与SI⁃HBase、hindex进行了二级索引性能对比实验,证明了该方案在写入性能上较SIHBase更快、更稳定,查询速度上要远快于hindex。 In view of HBase's lack of secondary index,self-contained coprocessor instability and difficult to meet the needs of massive data retrieval speed requirements,a new Elasticsearch-based HBase secondary indexing scheme ELHBase(Elasticsearch Indexing HBase)is designed.In this scheme,flume,Kafka,HBase and Elasticsearch are used to build a set of data acquisition,high-speed analy⁃sis and input big data processing framework,the flume is used to customize sink to collect data and generate corresponding ID to store in Kafka,the data is stored in HBase respectively through parsing technology,the corresponding ID is stored in Elasticsearch as an index.An interface is added to query Elasticsearch directly without coprocessor in the scheme,using the efficient,flexible and diverse retrieval functions provided by Elasticsearch to realize the rapid retrieval of HBase massive data,the problems are solved that HBase data index performance is not high,collaborator is not stable,elastic search is not suitable for a large number of data storage and so on.Finally,the performance of two-level index is compared with SIHBase and hindex,It is proved that this scheme is faster and more stable than SIH⁃Base in writing performance and faster than hindex in query speed.
作者 郭雪峰 GUO Xue-feng(Network Security Technology R&D Center,The Third Research Institute of Ministry of Public Security,Shanghai 201204,China)
出处 《电脑知识与技术》 2020年第1期5-7,共3页 Computer Knowledge and Technology
关键词 海量数据 二级索引 ELHBase 自定义Sink 快速检索 massive data secondary index ELHBase customize sink rapid retrieval
  • 相关文献

参考文献4

二级参考文献26

  • 1HBase :bigtable-like structured storage for hadoop hdfs [ EB/OL ]. http ://hadoop. apache, org/hbase/,2010.
  • 2Fan Chang, Jeffrey Dean, Sanjay Chemawat, et al. Bigtable: a dis- tributed storage system for structured data[ C ]. Proceedings of 7th USENIX Symposium on Operating Systems Design and Implemen- tation( OSDI'06 ), Seattle, WA, USA: USENIX Association, 2006 : 205-218.
  • 3Dhruba Borthakur. The hadoop distributed file system:Architecture and design [ EB/OL ]. http://hadoop, apache, org/hdfs ,2011.
  • 4Ramaswamy Hafiharaa,Bigit Hore,Chen Li,et al. Processing spatial- keyword (SK) queries in geographic information retrieval (GIR) sys- tems[ A]. Proceedings of the lgth International Conference on Scientif- ic and Statistical Database Managem (SSDBM '07) [ C ]. Washing- ton,DC,USA:IEEE Computer Society,2007:16-25.
  • 5Ian De Felipe, Vagelis Hristidis, Naphtali Rishe. Keyword search on spatial databases[ A] . Proceedings of the 2008 IEEE 24th Interna- tional Conference on Data Engineering (ICDE '08 ) [ C]. Washing- ton, DC, USA: IEEE Computer Society, 2008:656 -665.
  • 6Cong Gao ,Christian S Jensen,Wu Ding-ming. Efficient retrieval of the top-k most relevant spatial web objects [ J ]. Proceedings of VLDB Endowment,2009,2( 1 ) :337-348.
  • 7Jolo B Rocha-Junior, Orestis Gkorgkas, Simon Jonassen, et al. Ef- ficient processing of Top-k spatial keyword queries [ A ]. Proceed- ings of the 12th International Conference on Advances in Spatial and Temporal Databases ( SSTD ' 11 ) [ C ]. Berlin, Heidelberg : Springer-Verla,2011:205-222.
  • 8Guo Wei, Guo Jing, Hu Zhi-yong. Spatial database indexing tech- nique [ M ]. Shanghai: Shanghai Jiao Tong University,Press,2006.
  • 9Ooi, Mcdonell K J, Sacks R Davis. Spatial kd-tree: an indexing mechanism for spatial database [ A ]. Proceedings of the 11 th Annu- M International Computer Software and Applications Conference ( COMPSAC '87 ) [ C ]. Washington, DC, USA: IEEE Computer Society, 1987:433-438.
  • 10毛道伟,孙侠.模式改革初显成效 人才培养渐成特色——华工-华大基因组科学创新班学生《Science》、《Nature》频亮相引关注[J].广东科技,2010,19(11):15-18. 被引量:5

共引文献122

同被引文献5

引证文献2

二级引证文献6

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部