期刊文献+

基于PDC编码的中文文本压缩算法 被引量:1

Chinese text compression algorithm based on PDC coding
下载PDF
导出
摘要 针对中文文本结构的特点以及传统压缩算法对中文文本压缩的不足,提出并实现了一个基于PDC编码的中文文本压缩算法。该算法采用的是字典压缩方式。根据单个汉字在中文文本出现的概率,采用Huffman编码方式进行前缀变长编码;定义由某个汉字为前缀的词组和短语的深度;对具有相同前缀和相同深度的词组和短语进行局部的定长编码,构成一部压缩编码字典。通过对相同文本分别使用该算法和传统的LZW和LZSS编码算法压缩后得到的数据结果对比,压缩率有2.53%~40.48%的提高,表明该压缩算法有较好的压缩效果。 According to the characteristics of Chinese text structures and the disadvantages of traditional compression algorithm for Chinese text compression, it proposes and implements a Chinese text compression algorithm based on PDC coding. The algorithm uses dictionary compression. According to the words' probability that appears in the Chinese text, the prefix encoded variable-length coding uses Huffman coding, it defines the depth of the phrases and short sentences that prefixed by the word, the algorithm encodes partial fixed-length coding for the phrases and short sentences which have the same prefix and depth, it constructs a compression dictionary. By comparing with the tradition compression algorithm LZW and LZSS that in the same texts, the compression algorithm's compression ratio increases 2.53% ~40.48%, which means the compression algorithm has a better compression effect than the traditional compression algorithm.
作者 曾党泉
出处 《计算机工程与应用》 CSCD 北大核心 2015年第17期205-209,227,共6页 Computer Engineering and Applications
关键词 中文文本 压缩算法 前缀 深度 编码 压缩率 Chinese text compression algorithm prefix depth coding compression ratio
  • 相关文献

参考文献14

二级参考文献65

  • 1贺前华,徐秉铮,彭磊.中文文本压缩的自适应算法[J].中文信息学报,1993,7(3):46-54. 被引量:4
  • 2王忠效,姜丹.关于Lempel-Ziv 77压缩算法及其实现的研究[J].计算机研究与发展,1996,33(5):329-340. 被引量:19
  • 3吴军,王作英.汉语信息熵和语言模型的复杂度[J].电子学报,1996,24(10):69-71. 被引量:14
  • 4沈剑虹.RSS:信息整合传播的未来[J].河北大学学报(哲学社会科学版),2006,31(2):133-135. 被引量:8
  • 5Huffman,D.A.A Method for the Construction of Minimum-Redundancy Codes[C]//Proc.IRE 40,9(Sept.),1952:1098-1101.
  • 6Ziviani,N.,Moura,E.,Navarro,G.,& BaezaYates,R.Compression:a key for next-generation text retrieval systems[J].IEEE Computer,2000,33(11):37-44.
  • 7Witten,I.,Moffat,A.,& Bell,T.Managing gigabytes 2nd[M].Morgan Kaufmann Publishers.1999.
  • 8Ziv,J.,and Lempel,A.A Universal Algorithm for Sequential Data Compression[J].IEEE Transactions on Information Theory,1977,23(3):337-343.
  • 9Ziv,J.,and Lempel,A.Compression of Individual Sequences via Variable-Rate Coding[J].IEEE Transactions on Information Theory,1978,24(5):530-536.
  • 10J.A.Storer and T.G.Szymanski.Data Compression via Textual Substitution[J].Journal of the ACM,1982,29:928-951.

共引文献34

同被引文献4

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部