期刊文献+

基于复合结构的高效索引在线更新策略 被引量:1

On-line Update Strategy Based on High Performance of Hybrid
下载PDF
导出
摘要 倒排索引结构已被广泛地应用在信息检索系统中,倒排索引离线的生成和更新方法已不适合在线更新。文中研究了在线索引更新方法,分析了合并更新、插入更新、复合更新等方法,提出一种结合"插入更新"和"合并更新"优点,并采用多级结构的改进复合更新策略。使用磁盘操作复杂度来衡量更新策略的性能,对几种常用的更新策略和复合更新策略在大量记录下的性能进行理论和实验分析。结果显示,改进复合更新策略具有较好的效率。 Inverted index structures are the mainstay of modern text retrieval systems. While the off-line construction and update methods are not suitable for on-line update. This paper discusses the virtues and shortcomings of the re-merge strategy, in-place strategy and hybrid strategy, and presents an improved hybrid index update strategy with multilevel that combines the virtue of previous methods together. It uses the disk access complexity to analyze the performances of those strategies in very large text collections, both the theoretical and experimental results show that the improved hybrid index strategy has a better performance.
作者 赵亮
出处 《计算机工程》 CAS CSCD 北大核心 2008年第2期75-77,88,共4页 Computer Engineering
关键词 倒排索引 更新策略 倒排索引结构 inverted index update strategy inverted index structure
  • 相关文献

参考文献6

  • 1Witten t, Moffat A, Managing Gigabytes[M], San Francisco, CA: Morgan Kaufumann Publishers, 1999.
  • 2Lester N, Zobel J, Hugh E. Williams. In-place Versus Rebuild Versus Remerge: Index Maintenance Strategies for Test Retrieval System[C]//Proc. of the 27th Conference on Australasian Computer Science. Dunedin, New Zealand: [s. n.], 2004, 26:15-23
  • 3Buttcher S, Charles L. Hybrid Index Maintenance for Growing Text Collections[EB/OL], (2006-01-02) http://stefan.buettcher.org/papers/buettcher_2006 hybrid_index_maintenance_2.pdf.
  • 4Lester N, Moffat A, Zobel J. Fast On-line Index Construction by Geometric Partitioning[C]//Proc. of the 14th ACM Conference on Information and Knowledge Management. Bremen, Germany: [s. n.] 2005.
  • 5Zoble J. Inverted Files Versus Signature Files for Text Indexing[J]. ACM Transactions on Database Systems, 1998, 23(4): 453-490.
  • 6关毅,王晓龙,张凯.现代汉语计算语言模型中语言单位的频度—频级关系[J].中文信息学报,1999,13(2):8-15. 被引量:15

二级参考文献1

  • 1Li W,IEEE Trans Information Theory,1992年,38卷,6期,1842页

共引文献14

同被引文献9

  • 1Brin S,Page L.The Anatomy of a Large Scale Hypertextual Web Search Engine[C] ∥Proc of the 7th Int'l World Wide Web Conf,1998:107-117.
  • 2Ercegovac V,Josifovski V,Ning Li,et al.Supporting Sub-Document Updates and Queries in an Inverted Index[C] ∥Proc of the 17th ACM Conf on Information and Knowledge Management,2008:659-668.
  • 3Long Xiaohui,Suel T.Three-Level Caching for Efficient Query Processing in Large Web Search Engines[C] ∥Proc of on World Wide Web Conference Committee (IW3C2),2005:257-266.
  • 4Cambazoglu B B,Aykanat C.Performance of Query Processing Implementations in Ranking-Based Text Retrieval Systems Using Inverted Indices[J].Information Processing & Management,2006,42(4):875-898.
  • 5Ng Ben Chung-Pun,Wang Ch-Li.Document Distribution Algorithm for Load Balancing on an Extensible Web Server Architecture[C] ∥Proc of the 1st IEEE/ACM Int'l Symp on Cluster Computing and the Crid,2001:140-147.
  • 6姚全珠,张楠,杨增辉,田元.基于压缩后缀数组技术的搜索引擎[J].计算机工程,2008,34(10):83-85. 被引量:2
  • 7蒋维,郝文宁,杨晓恝,靳大尉.分布式数据库搜索引擎的索引建立和优化[J].计算机工程,2008,34(18):36-38. 被引量:7
  • 8庄毅,庄越挺,吴飞.一种支持海量跨媒体检索的集成索引结构[J].软件学报,2008,19(10):2667-2680. 被引量:13
  • 9郭瑞杰,程学旗,许洪波,王斌,丁国栋.一种基于动态平衡树的在线索引快速构建方法[J].计算机研究与发展,2008,45(10):1769-1775. 被引量:5

引证文献1

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部