期刊文献+

倒排索引压缩及在RDBMS全文检索中的实现 被引量:3

Compression of inverted and implementation in full-text information retrieval system RDBMS
下载PDF
导出
摘要 提出了一种对倒排索引进行压缩的方法,在保证较高压缩率的前提下,对压缩后的数据提供了随机访问的能力.这种方法将压缩后的数据分为两部分,第一部分用来表示单词在子区间的出现次数,第二部分用来表示单词在子区间的具体出现位置,详细描述了检索过程,通过第一部分的信息可以直接对第二部分的任意位置进行解压缩,体现了其随机访问能力,并分析了压缩比和检索效率,讨论了该压缩方法在RDBMS全文检索中的实现,以及如何用表格形式对其进行存储,针对多关键字的检索对算法进行了优化.该实现方法一方面充分利用了数据系统的优点,获得了良好的动态性能,另一方面节省了倒排索引对空间的需求,并提高了检索效率. A method to compress inverted indices with random access capability and high compressibility was proposed. The compressed data were divided into two parts: one part was the counter of the occurrence of the words in sub-areas, the other was the detailed position of the words in these sub-areas. The query process, which can embody the random access capability, was described. The second part could be directly decompressed at certain position according to the data of the first one, and the compressibility and query efficiency were analyzed. The implementation of this compression in full-text information retrieval system of RDBMS(Relational Datbase Management System) was introduced with the storage form of table. The optimization of query algorithm for multi-words was provided. In this implementation, on the one hand the excellent dynamic capability was gained with taking full advantage of RDBMS, on the other hand the demand of storage space was reduced, and query efficiency was enhanced.
作者 朱虹 吴林
出处 《华中科技大学学报(自然科学版)》 EI CAS CSCD 北大核心 2005年第4期7-9,共3页 Journal of Huazhong University of Science and Technology(Natural Science Edition)
基金 湖北省科技攻关项目(2002AA103A06).
关键词 全文检索 倒排索引 索引压缩 编码 full-text information retrieval inverted indices index compression integer coding
  • 相关文献

参考文献5

  • 1Witten I H, Moffat A, Bell T C. Managing gigabytes: compressing and indexing documents and images[M]. New York: Van Nostrand Reinhold, 1994.
  • 2Navarro G, Moura E, Neubert M. Adding compression to block addressing inverted indexes[J]. Information Retrieval, 2000, 3(1): 49-77.
  • 3Moffat A, Zobel J. Self-indexing inverted files for fast text retrieval[J]. ACM Transactions on Information Systems, 1996(10): 349-379.
  • 4Scholer F, Wiliams H, Yiannis J. Compression of inverted indexes for fast query evaluation[J]. ACM Transactions on Information Systems, 2002(8): 222-229.
  • 5Williams H, Zobel J. Compressing integers for fast file access[J]. Computer Journal, 1999, 42(3): 193-201.

同被引文献28

引证文献3

二级引证文献12

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部