期刊文献+

一种变长编码压缩倒排索引算法

An variable length code algorithm compression inverted index
原文传递
导出
摘要 全文检索的效率依赖于数据结构-倒排索引,存储倒排索引需要较大的硬盘存储空间。提出了一种新的压缩算法,主要用于倒排索引中文档标识符的压缩。对于给定的文档集合使用信息检索工具Terrier,使用不同的压缩算法压缩倒排索引中的文档标识符,从而生成倒排索引文件,然后比较倒排索引文件的大小。实验结果表明,使用新的压缩算法能够节省倒排索引文件的存储空间。 The efficiency of text search engines relies on data structure : inverted index. And the more large space is need to storage the inverted index. A new compression algorithm was proposed. For the given document collections. Terrier, the information retrival tool, was used to build inverted index, and the state-of-the-art compression techniques was used to compress inverted file. Then the compress ratio was confirmed by comparing the file size. Experiments show that thenew compression techniques can get much better compress ratio.
出处 《山东大学学报(理学版)》 CAS CSCD 北大核心 2014年第12期30-35,共6页 Journal of Shandong University(Natural Science)
基金 中央高校基本科研业务费专项资金项目(2011JBM231)
关键词 倒排索引 整数压缩 索引压缩 inverted index integer compression index compression
  • 相关文献

参考文献12

  • 1KOBAYASHI M, TAKEDA K. Information retrieval on the web [ J ]. ACM Computing Surveys ( CSUR), 2000, 32 (2) : 144- 173.
  • 2ANH V N, MOFFAT A. Improved word-aligned binary compression for text indexing[ J]. IEEE Transactions on Knowledge and Data Engineering, 2006, 18 (6) : 857-861.
  • 3SHIEH W Y, CHUNG C P. A statistics-based approach to incrementally update inverted files[ J]. Information Processing & Management, 2005, 41 (2) :275-288.
  • 4Wikipedia. Unary coding [ EB/OL ]. [ 2014-03-05 ]. http ://en. wikipedia, org/wiki/Unary _ coding.
  • 5SCHOLER F, WILLIAMS H E, YIANNIS J, et al. Compression of inverted indexes for fast query evaluation [ C ]//Proceed- ings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2002:222-229.
  • 6ELIAS P. Universal codeword sets and representations of the integers [ J]. IEEE Transactions on Information Theory, 1975, 21 (2) : 194-203.
  • 7RICE R F. Some practical universal noiseless coding techniques [ D]. Pasadena: California Institute of Technology, 1979.
  • 8MOFFAT A, STUIVER L. Binary interpolative coding for effective index compression [ J ]. Information Retrieval, 2000, 3 ( 1 ) :25-47.
  • 9HEMAN S. Super-scalar database compression between RAM and CPU-cache [ D ]. Amsterdam : University of Amsterdam, 2005.
  • 10SOMASUNDARAM K, DOMNIC S. Extended golomb code for integer representation[ J]. IEEE Transactions on Multimedia, 2007, 9(2) :239-246.

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部