基于Elasticsearch的HBase海量数据二级索引方案被引量：2

HBase Massive Data Secondary Indexing Scheme Based on Elasticsearch

下载PDF

导出

摘要针对HBase不提供二级索引、自带Coprocessor(协作器)不稳定及海量数据检索速度较慢等问题,设计了一种新的基于Elasticsearch的HBase二级索引方案ELHBase(Elasticsearch Indexing HBase)。该方案借助Flume、Kafka、HBase及Elastic⁃search搭建了一套数据采集、高速解析和录入大数据处理框架,使用Flume自定义Sink采集数据同时生成相应ID存入到Kafka,通过解析技术分别把数据存储到HBase,相应ID作为索引存储到ElasticSearch。该方案在不利用Coprocessor的基础上增加了直接查询ElasticSearch的接口,利用ElasticSearch提供的高效、灵活、多样的检索功能实现对HBase海量数据的快速检索,协同解决了HBase数据索引性能不高、协作器不稳定、ElasticSearch不适合大量数据存储等问题。最后,分别与SI⁃HBase、hindex进行了二级索引性能对比实验,证明了该方案在写入性能上较SIHBase更快、更稳定,查询速度上要远快于hindex。 In view of HBase's lack of secondary index,self-contained coprocessor instability and difficult to meet the needs of massive data retrieval speed requirements,a new Elasticsearch-based HBase secondary indexing scheme ELHBase(Elasticsearch Indexing HBase)is designed.In this scheme,flume,Kafka,HBase and Elasticsearch are used to build a set of data acquisition,high-speed analy⁃sis and input big data processing framework,the flume is used to customize sink to collect data and generate corresponding ID to store in Kafka,the data is stored in HBase respectively through parsing technology,the corresponding ID is stored in Elasticsearch as an index.An interface is added to query Elasticsearch directly without coprocessor in the scheme,using the efficient,flexible and diverse retrieval functions provided by Elasticsearch to realize the rapid retrieval of HBase massive data,the problems are solved that HBase data index performance is not high,collaborator is not stable,elastic search is not suitable for a large number of data storage and so on.Finally,the performance of two-level index is compared with SIHBase and hindex,It is proved that this scheme is faster and more stable than SIH⁃Base in writing performance and faster than hindex in query speed.

作者郭雪峰 GUO Xue-feng(Network Security Technology R&D Center,The Third Research Institute of Ministry of Public Security,Shanghai 201204,China)

机构地区公安部第三研究所网络安全技术研发中心

出处《电脑知识与技术》 2020年第1期5-7,共3页 Computer Knowledge and Technology

关键词海量数据二级索引 ELHBase 自定义Sink 快速检索 massive data secondary index ELHBase customize sink rapid retrieval

分类号 TP31 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献4

1黎建辉,沈志宏,孟小峰.科学大数据管理:概念、技术与系统[J].计算机研究与发展,2017,54(2):235-247. 被引量：72
2刘浩阳.MS SQL数据库在线取证研究[J].信息网络安全,2016(9):26-30. 被引量：9
3张榆,马友忠,孟小峰.一种基于HBase的高效空间关键字查询策略[J].小型微型计算机系统,2012,33(10):2141-2146. 被引量：33
4王文贤,陈兴蜀,王海舟,吴小松.一种基于Solr的HBase海量数据二级索引方案[J].信息网络安全,2017(8):39-44. 被引量：15

二级参考文献26

1HBase :bigtable-like structured storage for hadoop hdfs [ EB/OL ]. http ://hadoop. apache, org/hbase/,2010.
2Fan Chang, Jeffrey Dean, Sanjay Chemawat, et al. Bigtable: a dis- tributed storage system for structured data[ C ]. Proceedings of 7th USENIX Symposium on Operating Systems Design and Implemen- tation( OSDI'06 ), Seattle, WA, USA: USENIX Association, 2006 : 205-218.
3Dhruba Borthakur. The hadoop distributed file system:Architecture and design [ EB/OL ]. http://hadoop, apache, org/hdfs ,2011.
4Ramaswamy Hafiharaa,Bigit Hore,Chen Li,et al. Processing spatial- keyword (SK) queries in geographic information retrieval (GIR) sys- tems[ A]. Proceedings of the lgth International Conference on Scientif- ic and Statistical Database Managem (SSDBM '07) [ C ]. Washing- ton,DC,USA:IEEE Computer Society,2007:16-25.
5Ian De Felipe, Vagelis Hristidis, Naphtali Rishe. Keyword search on spatial databases[ A] . Proceedings of the 2008 IEEE 24th Interna- tional Conference on Data Engineering (ICDE '08 ) [ C]. Washing- ton, DC, USA: IEEE Computer Society, 2008:656 -665.
6Cong Gao ,Christian S Jensen,Wu Ding-ming. Efficient retrieval of the top-k most relevant spatial web objects [ J ]. Proceedings of VLDB Endowment,2009,2( 1 ) :337-348.
7Jolo B Rocha-Junior, Orestis Gkorgkas, Simon Jonassen, et al. Ef- ficient processing of Top-k spatial keyword queries [ A ]. Proceed- ings of the 12th International Conference on Advances in Spatial and Temporal Databases ( SSTD ' 11 ) [ C ]. Berlin, Heidelberg : Springer-Verla,2011:205-222.
8Guo Wei, Guo Jing, Hu Zhi-yong. Spatial database indexing tech- nique [ M ]. Shanghai: Shanghai Jiao Tong University,Press,2006.
9Ooi, Mcdonell K J, Sacks R Davis. Spatial kd-tree: an indexing mechanism for spatial database [ A ]. Proceedings of the 11 th Annu- M International Computer Software and Applications Conference ( COMPSAC '87 ) [ C ]. Washington, DC, USA: IEEE Computer Society, 1987:433-438.
10毛道伟,孙侠.模式改革初显成效人才培养渐成特色——华工-华大基因组科学创新班学生《Science》、《Nature》频亮相引关注[J].广东科技,2010,19(11):15-18. 被引量：5

共引文献122

1刘一流.一种面向智能交通场景的HBase时空索引设计[J].电脑知识与技术,2020,0(4):163-165. 被引量：1
2金锋.大数据驱动科技信息资源市场化开发利用[J].城市建设理论研究（电子版）,2023(33):208-210.
3叶汉林.论教学中的情感性管理[J].湖北大学学报（哲学社会科学版）,2000,27(2):107-108.
4何景武,赵相森,王光辉.DG填充剂在水胎胶中的应用[J].轮胎工业,2000,20(3):151-153.
5吴文珍,张秀林.弹簧全启式液化气安全阀密封结构的改进[J].中国锅炉压力容器安全,2000,16(2):20-20.
6周相兵,马洪江,苗放.云计算环境下的一种基于Hbase的ORM设计实现[J].西南师范大学学报（自然科学版）,2013,38(8):130-135. 被引量：14
7郭荔荔,李敬兆.基于云存储的井下人员定位数据处理[J].电脑知识与技术,2014(2):844-846. 被引量：1
8陈娜,张金娟,刘智琼,徐歆壹.基于Hadoop平台的电信大数据入库及查询性能优化研究[J].移动通信,2014,38(7):58-63. 被引量：9
9陈燕红,张太红,马健.中英文跨语种嵌入式数据库查询处理研究[J].计算机应用与软件,2014,31(6):244-247.
10陈磊,封朝永.HBase下时态信息索引策略研究[J].广东工业大学学报,2014,31(3):102-108. 被引量：3

同被引文献5

1董长青,任女尔,张庆余,田玉靖.基于HBase+ ElasticSearch的海量交通数据实时存取方案设计[J].大数据,2017,3(1):80-89. 被引量：7
2王文贤,陈兴蜀,王海舟,吴小松.一种基于Solr的HBase海量数据二级索引方案[J].信息网络安全,2017(8):39-44. 被引量：15
3朱松杰,娄渊胜,叶枫,李凌,陈勇.基于协处理器的HBase内存索引机制的研究[J].计算机工程与应用,2020,56(1):98-105. 被引量：11
4陈顺举,邹喆,刘锐,陶涛,汪超,郑林江.基于协处理器的HBase分类二级索引设计[J].重庆理工大学学报（自然科学）,2021,35(4):142-151. 被引量：2
5李星,邬少飞.基于Hbase的车联网海量数据存储[J].电脑与电信,2021(5):59-62. 被引量：2

引证文献2

1范朗.Elasticsearch海量数据存储查询优化[J].工业控制计算机,2020,33(10):85-87. 被引量：6
2康志文,房鹏,郑明钊,李瑶,高宗宝,周波.大规模人群行为分析数据的二级索引方案设计[J].信息技术与信息化,2021(11):16-18.

二级引证文献6

1赵鸿.5G消息A2P交互系统的设计与研究[J].电信快报,2021(4):37-40.
2谷连军,孟秀军.无人机巡视非结构化数据服务平台研究[J].信息技术与信息化,2021(6):203-206. 被引量：1
3武学鸿,朱建平,李建华.面向临床数据中心的信息检索研究与应用[J].医学信息,2022,35(2):10-14. 被引量：1
4陶彪.基于Elasticsearch的高精度地名检索服务设计[J].浙江测绘,2021(4):31-33.
5马潇潇,孙甲琦,朱宏涛,张宇东.基于容器云的地面测控资源池日志管理系统的研究[J].遥测遥控,2023,44(6):57-63. 被引量：1
6郭翠娟,李思佳.基于ELKB日志管理系统的应用[J].科学技术与工程,2024,24(3):1146-1153. 被引量：1

1郭玉芝,周太宇.基于SSM框架的高校学生平时成绩管理系统的设计与实现[J].现代信息科技,2019,3(23):17-19. 被引量：6
2俞志宏,栗国保,李少白.基于Elasticsearch的时空大数据存储与分析方法[J].电子技术与软件工程,2019,0(22):152-154. 被引量：3
3申振东,黄江,戴添华.我国民族地区民用机场协同共享发展研究——以铜仁凤凰机场为例[J].贵州工程应用技术学院学报,2019,37(5):91-98.
4Biswajit Mishra,Mittu Kochery,Peter Wilson,Reuben Wilcock.A Novel Signal Processing Coprocessor for <i>n-</i>Dimensional Geometric Algebra Applications[J].Circuits and Systems,2014,5(11):274-291. 被引量：1
5Alexander Kalinkin,Anton Anders,Roman Anders.Intel^(■) Math Kernel Library PARDISO* forIntel^(■) Xeon Phi^(TM) Manycore Coprocessor[J].Applied Mathematics,2015,6(8):1276-1281.
6李怀强,衣强,张岩.EHWSNs中面向吞吐量和能耗优化的时隙分配算法研究[J].重庆邮电大学学报（自然科学版）,2020,32(1):121-128. 被引量：3
7赵浩苏,邢凯,宋力.基于CNN特征提取和增量式字典的VSLAM回环检测[J].计算机应用与软件,2020,37(1):157-164. 被引量：3
8杨信廷,王明亭,徐大明,罗娜,孙传恒.基于区块链的农产品追溯系统信息存储模型与查询方法[J].农业工程学报,2019,35(22):323-330. 被引量：85
9Ukbagiorgis Iyasu Gebremeskel,José Manuel Martins Ferreira.An IEEE 1149.x Embedded Test Coprocessor[J].Circuits and Systems,2014,5(7):170-180. 被引量：1
10朱松杰,娄渊胜,叶枫,李凌,陈勇.基于协处理器的HBase内存索引机制的研究[J].计算机工程与应用,2020,56(1):98-105. 被引量：11

电脑知识与技术

2020年第1期

浏览历史

内容加载中请稍等...

基于Elasticsearch的HBase海量数据二级索引方案被引量：2

参考文献4

二级参考文献26

共引文献122

同被引文献5

引证文献2

二级引证文献6

相关作者

相关机构

相关主题

浏览历史

基于Elasticsearch的HBase海量数据二级索引方案 被引量：2

参考文献4

二级参考文献26

共引文献122

同被引文献5

引证文献2

二级引证文献6

相关作者

相关机构

相关主题

浏览历史

基于Elasticsearch的HBase海量数据二级索引方案被引量：2