摘要
为解决全文索引的索引结构压缩问题,提出了文本的基于正规哈夫曼编码小波树形式,并将该结构与后缀数组结合,实现了基于正规哈夫曼编码的小波树和高效构造算法。实验结果表明,在不降低运行效率的前提下,存储空间得到有效的压缩,从而证明了改进方法的有效性。
To solve the problem of index structures compression of full-text indexes, we introduce the canonical Huffman code to encode the BWT ( Burrows-Wheeler Transform) of a text. In the end, we present an efficient construction algorithm for this index, which is on-line and linear. Experimental results show that, without reducing the efficiency, the effective storage space compression is gained, which improves the effectiveness of the method.
出处
《吉林大学学报(信息科学版)》
CAS
2011年第5期473-476,共4页
Journal of Jilin University(Information Science Edition)
基金
吉林省教育厅科技发展规划基金资助项目(2008158)
关键词
全文索引
压缩
正规哈夫曼编码
full-text indexes
compressed
canonical Huffman code