摘要
在全文信息检索系统中,存储文本及其上关键词的索引结构需要大量的空间。位图索引不能支持基于信息量的查询,倒排文件需要的空间比较大。提出了频率向量这种索引结构的压缩存储方法,设计并实现了基于这种压缩存储方法的存储结构,理论分析表明该压缩方法与存储结构可以获得较高的压缩比;此外,还讨论了压缩频率向量上的查询处理技术,实验结果表明这种压缩的索引结构能够保证查询结果的完备性,并能有效地提高频率向量的存储和查询效率。
In full-text retrieval systems ,keyword-based indexes is always an important technique for efficient information retrieval. Existing bitmaps can't support queries based on the quantum of keywords and inverted files need a large amount of storage space.A compression method and a storage structure for a kind of index named frequency vectors are presented in this paper. Theoretical analysis gives a upper bound of compression ratio.Query processing method based on the compressed index is also discussed.Experimental results indicate that this compressed index can guarantee to obtain complete query results and high efficiency.
出处
《计算机工程与应用》
CSCD
北大核心
2007年第8期149-153,共5页
Computer Engineering and Applications
关键词
频率向量
压缩
离散化
查询处理
倒排索引
frequency vectors
compression
discretization
query procession
inverted index