期刊文献+

Lucene索引段合并优化策略 被引量:3

Merging optimization strategy of Lucene’s index segment
下载PDF
导出
摘要 随着大数据应用发展,如何从海量数据中进行高效信息搜索成为研究热点。Lucene全文搜索引擎通过索引段合并来提高索引效率,但Lucene索引段合并过程大多需要从磁盘加载各索引段,将占用大量系统资源,降低系统吞吐量。针对该问题,提出基于Lucene索引段合并优化策略,该策略通过负载系数来选择不同的索引段合并操作。为提高数据的检索速度,进一步建立索引段相似度评价模型来选择出最优合并索引段集合进行合并。通过与现有Tiere,LogByte,LogDoc等合并策略进行实验对比,提出的优化策略能有效减少索引段合并次数,提升系统吞吐量及索引效率。 With the development of big data applications,how to search efficient information from massive data has become the hotspot of research.Lucene,which is a full-text search engine,improves index efficiency by the mechanism of merging index segment.However,since the merging process of index segment mostly needs to load each index segment from disk,it will consume a lot of system resources and reduce system throughput.Aiming at this problem,a merging optimization strategy based on Lucene index segment is proposed.The strategy considers the node load to select the merging operation of index segment.In order to improve the retrieval speed of data,an index segment similarity evaluation model is further established to select the optimal index segment to be merged.By comparing with Tiere,LogByte,LogDoc and other merge strategies,the proposed optimization strategy can effectively reduce the number of merges of index segments and improve system throughput and index efficiency.
作者 熊安萍 李传根 曹春江 XIONG Anping;LI Chuangen;CAO Chunjiang(College of Computer Science and Technology,Chongqing University of Posts and Telecommunications,Chongqing 400065,P.R.China;Yunnan branch of China Telecom Co.,Ltd.,Kunming 650000,P.R.China)
出处 《重庆邮电大学学报(自然科学版)》 CSCD 北大核心 2020年第1期105-112,共8页 Journal of Chongqing University of Posts and Telecommunications(Natural Science Edition)
基金 重庆市基础科学与前沿技术研究项目(cstc2017jcyjAX) 重庆市教委科学技术研究项目(KJ1704085) 重庆邮电大学博士启动基金(A2015-17)~~
关键词 LUCENE 索引段合并 负载系数 索引段相似度 最优合并索引段 Lucene merging of index segment node load similarity of index segment optimal index segment of merging
  • 相关文献

参考文献6

二级参考文献42

  • 1吴广君,王树鹏,陈明,李超.海量结构化数据存储检索系统[J].计算机研究与发展,2012,49(S1):1-5. 被引量:31
  • 2李庆华,赵彦斌,赵峰,彭进劲.基于向量空间模型的并行信息检索算法[J].小型微型计算机系统,2005,26(9):1560-1562. 被引量:8
  • 3张凤林,刘思峰.Huffman~*:一个改进的Huffman数据压缩算法[J].计算机工程与应用,2007,43(2):73-74. 被引量:19
  • 4吴广军.海量结构化数据存储检索系统[J].中国科学院计算机研究,2010(09).
  • 5李超.大数据存储检索分析与应用[J].云计算数据中心,2010(09).
  • 6陈波.基于开源全文检索系统Solr的OPAC分面浏览[J].现代图书情报技术,2007(11):72-75. 被引量:14
  • 7Buckley C, Voorhees E M. Evaluating evaluation mea-sure stability [ C] // Proceedings of the 23 rd InternationalACM SIGIR Conference on Research and Development inInfomation Retrieval. Athens,Greece : ACM, 2000 :33 -40.
  • 8Sakai T. Evaluating evaluation metrics based on thebootstrap[ C] //Proceedings of the 29th Annual Interna-tional ACM SIGIR Conference on Research and Develop-ment in Information Retrieval. Seattle : ACM,2006 :525 -532.
  • 9Lin W H, Hauptmann A. Revisiting the effect of topicset size on retrieval error [ C] // Proceedings of the 2SthAnnual International ACM SIGIR Conference on Researchand Development in Information Retrieval. Salvador,Brazil: ACM, 2005:637 -638.
  • 10Wu Shengli, McClean Sally. Evaluation of system mea-sures for incomplete relevance judgment in IR [ C ] //Proceedings of 1th International Conference on FlexibleQuery Answering Systems. Milan, Italy: Springer Ver-lag,2006:245 -256.

共引文献9

同被引文献27

引证文献3

二级引证文献11

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部