摘要
本文给出了一种适用中文全文检索系统的压缩模型 ,使传统的LZW模型能适用于大字符集语言源文本。方法的关键是通过引入切割标记控制字典多叉树的节点的无限扩大。对文件的检索直接在压缩文件上进行 ,因而可较大地提高检索效率。
We propose an efficient compression scheme for Chinese text which is based on the useful LZW method.The general purpose compression utilities is not suited for Chinese text for its large alphabet.The key technique in our scheme is“Chinese words segment signs”which could reduce the size of the tree dictionary.The retrieve of the document is processed in the compressed file directly,therefore,allowing faster search at the same time.
出处
《中文信息学报》
CSCD
北大核心
2000年第4期42-47,共6页
Journal of Chinese Information Processing
基金
国家 8 6 3项目!(86 3- 30 6 -ZD0 3- 0 4- 1)