摘要
倒排索引结构已被广泛地应用在信息检索系统中,倒排索引离线的生成和更新方法已不适合在线更新。文中研究了在线索引更新方法,分析了合并更新、插入更新、复合更新等方法,提出一种结合"插入更新"和"合并更新"优点,并采用多级结构的改进复合更新策略。使用磁盘操作复杂度来衡量更新策略的性能,对几种常用的更新策略和复合更新策略在大量记录下的性能进行理论和实验分析。结果显示,改进复合更新策略具有较好的效率。
Inverted index structures are the mainstay of modern text retrieval systems. While the off-line construction and update methods are not suitable for on-line update. This paper discusses the virtues and shortcomings of the re-merge strategy, in-place strategy and hybrid strategy, and presents an improved hybrid index update strategy with multilevel that combines the virtue of previous methods together. It uses the disk access complexity to analyze the performances of those strategies in very large text collections, both the theoretical and experimental results show that the improved hybrid index strategy has a better performance.
出处
《计算机工程》
CAS
CSCD
北大核心
2008年第2期75-77,88,共4页
Computer Engineering
关键词
倒排索引
更新策略
倒排索引结构
inverted index
update strategy
inverted index structure