期刊文献+

COX:高压缩率的中文XML文档压缩技术 被引量:2

COX:Chinese-oriented XML compressor with high compression ratio
下载PDF
导出
摘要 针对当前常用的XML压缩算法没有考虑中文特点的情况,结合中文与XML的特点,提出一种高压缩率的适合中文XML文档的压缩算法COX。利用中文分词技术对XML文档进行分词处理,通过统计词频后获得排序的词典,利用Huffman编码思想对高频及长词汇进行压缩编码;解析XML文档后,把文档元素进行分类,同一类型的元素放入同一容器之中;算法还特别针对数字类型的数据进行了特殊处理。实验结果显示,相对于通用的压缩软件,COX具有更好的压缩效果,但压缩和解压缩时间要慢一些。 To overcome the shortcoming of the current XML compression algorithms which do not distinguish be- tween Chinese characters and English words, it presents a Chinese-oriented XML compressor with high compres- sion ratio, called COX. The input documents are preprocessed by using the technology of Chinese word segmenta- tion, the sorted dictionary is obtained by counting the word frequency, and then the high-frequency and long-size words are coded by using the Huffman coding method. The items in the XML documents are classified by analyzing the documents, the items with the same class tag are sent to the same container. Moreover, the numerical data are processed especially jn COX. The experimental results show that, compared to the general compression algorithms, COX achieves higher compression ratio if the XML documents contain more Chinese words, while needing more compression and decompression time as return.
出处 《计算机工程与应用》 CSCD 2012年第17期143-147,共5页 Computer Engineering and Applications
基金 国家自然科学基金委员会与中国工程物理研究院联合基金(No.10876012)
关键词 中文XML文档 数据压缩 中文分词 词典 Chinese XML document data compression Chinese word segmentation dictionary
  • 相关文献

参考文献9

  • 1Liefke H, Suciu D.XMilI: an efficient compressor for XML data[C]//Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, Dallas, Texas, USA.New York, NY, USA: ACM Press, 2000: 153-164.
  • 2Pankaj M T, Jayant R H.XGR1ND: a query friendly XML compressor[C]//Proceedings of the 18th International Conference on Data Engineering, San Jose, California, USA.Los Alamitos, California, USA: IEEE Computer Society, 2002: 225-234.
  • 3Ng W, Lam W Y, Wood P T, et al.XCQ: a queriable XML compression system[J].Knowledge and Information Systems, 2006,10(4) :421-452.
  • 4胡和平,魏裕凯.XCfde:高压缩率的XML文档压缩技术[J].计算机工程与科学,2007,29(2):44-46. 被引量:5
  • 5Skibinski P, Grabowski S, Swacha J.Effective asymmetric XML compression[J].Software Practice and Experience, 2008,38(10) : 1027-1047.
  • 6Cheney J.Compressing XML with multiplexed hierarchical PPM models[C]//Proceedings of the 2001 IEEE Data Compression Conference, Snow-bird, UT, USA. New York: IEEE Computer Society, 2001 : 163-172.
  • 7Adiego J, Navarro G, Fuente P.Using structural contexts to compress semistructured text collections[J].Information Processing and Management, 2007,43 (3) : 769-790.
  • 8Min J,Park M, Chung C.XPRESS: a queriable compression for XML data[C]//Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, San Diego, California, USA.New York: ACM Press, 2003 : 122-133.
  • 9Cheney J.Tradeoffs in XML database compression[C]// Proceedings of the 2006 IEEE Data Compression Conference, Snow-bird, UT, USA.New York: IEEE Computer Society, 2006: 392-401.

二级参考文献8

  • 1陈怡,卿锋.在C语言中使用正则表达式[J].华南金融电脑,2004,12(4):57-59. 被引量:5
  • 2曾春平,王超,张鹏.XML编程从入门到精通[M].北京:希望电子出版,2002,2.
  • 3Ziv J,Lempel A.A Universal Algorithm for Sequential Data Compression[J].IEEE Trans on Information Theory,1977,23(3):337-343.
  • 47Zip[CP/OL].http://www.7zip.org,2005-01.
  • 5Liefke H,Suciu D.XMILL:An Efficient Compressor for XML Data[A].Proc of the 2000 ACM SIGMOD Conf on Management of Data[C].2000.153-164.
  • 6Min JunKi,Park Myung-Jae,Chung ChinWan.XPRESS:A Queriable Compression for XML Data[A].Proc of the 2003 ACM SIGMOD Conf on Management of Data[C].2003.122-133.
  • 7ICT_XMLExpress[CP/OL].http://www.ictcompress.com/downloadxml.html,2005-01.
  • 8华强.在文本压缩中联合使用LZSS和LZW[J].计算机应用与软件,2002,19(1):60-62. 被引量:3

共引文献4

同被引文献12

引证文献2

二级引证文献8

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部