期刊文献+

n-Gram/2L索引结构的存储与时间优化算法 被引量:2

Space and time optimized algorithm of n-Gram/2L index structure
下载PDF
导出
摘要 对分词检索算法n-Gram/2L的索引结构作了改进,在第二级倒排表中加入对文章标识的索引,提出一种基于Zigzag的分词检索算法n-Gram/2LZ(n-Gram/2LonZigzagjoin)。在对数据量较大的文章进行检索和索引时,该算法在保留原有算法特性的基础上进一步减少了索引冗余,降低了索引的存储量,同时对查询算法的优化降低了查询时的系统开销,并且减少索引中记录访问次数,提高了查询效率。 This paper presents an improved algorithm of n-Gram/2L index for text retrieval by adding document identifier index into the secondary level inverted index,and proposes a retrieval algorithm:n-Gram/2LZ (n-Gram/2L on Zigzag join) based on Zigzag join.This algorithm retains the advantage of former n-Gram/2L algorithm and reduces redundancy and storage of the document index,while retrieving and indexing large data.And the optimization of the query algorithm decreases the system overhead when processing query as well as enhances query efficiency by reducing reading the same record repeatedly.
出处 《计算机工程与应用》 CSCD 北大核心 2008年第5期180-183,共4页 Computer Engineering and Applications
基金 国家高技术研究发展计划(863)(the National High-Tech Research and Development Plan of China under Grant No.2006AA01Z140)
关键词 算法 索引 N-GRAM 倒排表 algorithms indexing n-gram inverted index
  • 相关文献

参考文献6

  • 1Kim Min-Soo,Whang Kyu-Young,Lee Jae-Gil,et al.n-Gram/2L:a space and time efficient two-level n-Gram inverted index structrue[C]//Proceeding of the 31st International Conference VLDB, Trondheim, Norway, 2005 : 325-336.
  • 2Silverstein C, Marais H, Henzinger M, et al.Analysis of a very-large Web search engine query log[C]//SIGIR Forum, 1999,33(1):6-12.
  • 3Miller E,Shen Dan,Liu Jun-li,et al.Performanee and sealability of a large-scale N-gram based information retrieval system[J].Journal of Digital Information, 2000, 1 ( 5 ) : 1-25.
  • 4Mitra S,Hsu W W,Winslett M.Trustworthy keyword search for regulatory -compliant records retention [C]//VLDB' 06, Seoul,Korea,September 12-15,2006:1001-1012.
  • 5Witten I, Moffat A, Bell T.Managing gigabytes :compressing and indexing documents and images[M].Los Altos, California: Morgan Kaufmann Publishers, 1999.
  • 6Fontoura M F,Neumann A,Rajagopalan S,et al.High performance index build algorithms for intranet search engines[C]//VLDB,2004.

同被引文献20

引证文献2

二级引证文献12

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部