期刊文献+

大数据环境下Lucene性能优化方法研究 被引量:2

Performance optimization method of Lucene in big data
下载PDF
导出
摘要 为提高大数据环境下的数据查询分析效率,该文结合内存计算技术和批量更新技术提出一种优化倒排索引方法——内存磁盘索引(RFDirectory)。基于Lucene实现内存和磁盘相结合的倒排表管理技术。将新增数据写入缓存中,并周期性地写入磁盘索引结构中,从而提升倒排索引的写入性能。通过整合磁盘和内存的多分块倒排结构,为用户提供高效的数据查询分析结果。实验结果表明:在大数据环境下,RFDirectory方法的索引构建时间缩短为磁盘索引(FSDirectory)、内存索引(RAMDirectory)方法索引构建时间的50%,返回1个关键字的检索结果耗时缩短了近15%。 To improve the data query efficiency in big data,an optimized inverted index method—RAM FS directory( RFDirectory) is proposed here based on memory computing and batch processing technique. A post-list management technique combining random access memory( RAM) and disk is realized based on Lucene. New data are written into a cache,and then written into a disk index periodically to improve the writing performance of the inverted index method. Data query results are provided efficiently to consumers by integrating the multiple block inverted structure of the disk and RAM. Experimental results show that the index constructing time of RFDirectory is 50% of that of FSDirectory or RAMDirectory,and the time consuming of returning the index result of one keyword is reduced by 15% in big data.
作者 马旸 蔡冰
出处 《南京理工大学学报》 EI CAS CSCD 北大核心 2015年第3期260-265,共6页 Journal of Nanjing University of Science and Technology
关键词 大数据 LUCENE 内存计算 批量更新 倒排索引 倒排表 缓存 内存索引 磁盘索引 多分块倒排结构 big data Lucene memory computing batch processing inverted index post-list cache random access memory index disk index multiple block inverted structure
  • 相关文献

参考文献10

  • 1Scholer F,Williams H E’Yiannis J,et al. Compressionof inverted indexes for fast query evaluation [ A ].Proceedings of the 25th Annual International ACMSIGIR Conference on Research and Development in In-formation Retrieval [ C]. New York’ NY,USA: ACM,2002:222-229.
  • 2Moffat A,Zobel J. Self-indexing inverted files for fasttext retrieval [ J ]. ACM Transactions on InformationSystems,1996,14(4) :349-379.
  • 3Persin M, Zobel J,Sacks-Davis R. Filtered documentretrieval with frequency-sorted indexes[ J]. Journal ofthe American Society for Information Science, 1996,47(10):749-764.
  • 4Brin S, Page L. The anatomy of a large-scalehypertextual Web search engine [ A ]. Proceedings ofthe 7 th WWW Conference [ C ]. Brisbane, Australia :ScienceDirect,1998:107 -117.
  • 5谭斌,丁莎,车念,徐力,聂清彬,谭钱茂,黄翔.一种面向域的高效倒排索引结构及实时更新[J].四川大学学报(自然科学版),2011,48(2):321-326. 被引量:2
  • 6高梦娇,吕玉琴,侯宾.基于R-tree和倒排文件的混合索引的设计与实现[EB/ 0L ]. http ://www. paper, edu.cn/html/ releasepaper/2012/12/718/ ,2012-12-02.
  • 7马健,张太红,陈燕红.中文搜索引擎分块倒排索引存储模式[J].计算机应用,2013,33(7):2031-2036. 被引量:10
  • 8刘小珠,彭智勇,陈旭.高效的随机访问分块倒排文件自索引技术[J].计算机学报,2010,33(6):977-987. 被引量:14
  • 9Hatcher E,Gospodnetic 0. Lucene in action[ EB/OL]http ://citeseerx. ist. psu. edu/showciting? cid =541300,2015-06-03.
  • 10中科院髙能物理研究所计算中心.httP://www.datatang. com/data/45499/,2015-06-03.

二级参考文献42

  • 1刘小珠,孙莎,曾承,彭智勇.基于缓存的倒排索引机制研究[J].计算机研究与发展,2007,44(z3):153-158. 被引量:8
  • 2彭波,李晓明.搜索引擎倒排文件的一种分块组织技术[J].电子学报,2005,33(2):358-362. 被引量:9
  • 3王智强,刘建毅.一种实时更新索引结构的设计与实现[J].计算机系统应用,2005,14(10):79-82. 被引量:8
  • 4吴文娟,车明.搜索引擎倒排索引技术的改进[J].微处理机,2006,27(6):83-85. 被引量:8
  • 5Navarro G, Makinen V. Compressed full-text indexes. ACM Computing Surveys, 2007, 39(1) : 1-61.
  • 6Zobel J, Moffat A. Inverted files for text search engines. ACM Computing Surveys, 2006, 38(2): 1-56.
  • 7Seoa C, Leeb S W, Kima H J. An efficient inverted index technique for XML documents using RDBMS. Information and Software Technology, 2003, 45(1) : 11-22.
  • 8Wang Chao-Kun, Li Jian Zhong, Shi Sheng-Fei. N-gram inverted index structures on music data for theme mining and content based information retrieval. Pattern Recognition Letters, 2006, 27(9): 492-503.
  • 9Gupta A, Hon W K, Shah R, Vitter J S. Compressed dictionaries: Space measures, data sets, and experiments. Lecture Notes in Computer Science, 2006, 4007:158-169.
  • 10Buttcher S, Clarke C L A. Index compression is good, especially for random access//Proceedings of the 16th ACM Conference on Information and Knowledge Management. Lisboa, Portugal, 2007:761-770.

共引文献20

同被引文献19

引证文献2

二级引证文献13

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部