摘要
结合中文文本中的汉字编码方式、大字符集以及重复字串不长三个不同于英文文本的结构特点对LZW算法从读取数据方式、基本码集和字典码值输出方式三方面进行了修改。改进后的算法对中文文本的压缩比平均比LZW19提高了19%且压缩和解压速度与后者相当,其对较长的中文文本的平均压缩比已接近或者超过了压缩软件WinRAR。
This paper presents a compression algorithm for Chinese text which is improved from LZW algorithm. By modify-ing LZW algorithm’s dictionary size, basic set and the output way of dictionary code, the improved algorithm LZW_CH demonstrates about 19%higher compression ratio than LZW19’s with almost the same execution speed. LZW_CH doesn’t need any pre-processing work for the compressing data. As a single compression algorithm, LZW_CH’s compression with long Chinese text has closed or exceeded the professional compression utility WinRAR.
出处
《计算机工程与应用》
CSCD
2014年第3期112-116,共5页
Computer Engineering and Applications
基金
中南大学自由探索计划(No.201011200121)
关键词
中文文本
数据压缩
压缩算法
编码
LZW
Chinese text
data compression
compression algorithm
encoding
LZW