摘要
为了获取压缩中文文本的高压缩比 ,变长编码集合扩展的中文文本压缩算法依据中文语言文字的特点 ,以不等长高概率汉字串为单位 ,定义固定字典集 ,同时寻求高压缩率的匹配方式进行编码 .算法的编码转换过程适应了自然语言中的部分马尔可夫过程 ,相对于不同文本长度及文体风格压缩比分布均衡 .
In order to get high compression ratio for a compresed Chinese text, the compression algorithm for unfixed length encoding set expansion encodes the text by matching for high compression ratio, based on a set of fixed dictionaries that comprise unfixed length and high frequency Chinese character strings following features of the Chinese language. This algorithm fits the Chinese character string as Markov message source. It also suits different lengths and the language style of the source data. This algorithm can result in higher compression ratio.
出处
《北京理工大学学报》
EI
CAS
CSCD
北大核心
2001年第4期480-484,共5页
Transactions of Beijing Institute of Technology