摘要
针对HBase不提供二级索引和华为的hindex方案难以满足海量数据检索速度需求的问题,文章设计了基于Solr的HBase二级索引方案SIHBase(Solr Indexing HBase)。该方案使用HBase的Coprocessor(协处理器)为数据表的创建、修改、删除以及数据的插入、更新、删除和恢复等操作都实现了相应的回调函数,通过回调函数向Solr发送相关请求,以实现在Solr中自动为HBase建立和维护二级索引,保证数据与索引的一致性。该方案具有良好的通用性,可以同时为多张表的多列数据建立索引。该方案扩展了HBase的客户端功能,增加了直接查询Solr的接口,利用Solr提供的高效、灵活、多样的检索功能实现对HBase海量数据的快速检索。最后,与hindex进行了二级索引的查询性能对比实验,证明了该方案在查询速度上要远快于hindex。
For the problem that HBase cannot provide secondary indexes and Huawei hindex scheme is difficult to meet the project demand of retrieval speed, designed an HBase secondary index scheme named SIHBase(Solr Indexing HBase) based on Solr. The scheme uses HBase Coprocessor to achieve the related callback function for creating, changing, deleting operations of the data tables and inserting, updating, deleting recovering operations of data. Thus, it can create and manage secondary indexes in Solr for HBase automatically and ensure the consistency of the data and index. The scheme has favorable generality and can create index for multi-column data of multiple tables in the meantime. And then extended the client-side function of HBase, increased the direct query interface of Solr, using efficient, flexible and diversified retrieval functions which Solr provided to realize quick retrieval for the mass data of HBase. Finally, a contrast experiment about query performance with hindex show that SIHBase was turned out to be much faster than hindex in query speed.
出处
《信息网络安全》
CSCD
2017年第8期39-44,共6页
Netinfo Security
基金
国家科技支撑计划[2012BAH18B05]
国家自然科学基金[61272447]
四川省科技厅计划项目[16ZHSF0483]