摘要
XML(extensible markup language)文档已经被广泛用作应用程序的一个数据交换格式,针对XML数据的压缩技术也逐渐成为新的研究领域。提出XSLC(XMLstream layered-coding compression)算法,通过预先扫描DTD对数据模式进行分析,继而根据元素的父子关系进行子元素层面的编码;同时根据数据类型进行数据压缩,能够在压缩之后的文档上进行查询,因为仅需一遍压缩扫描所以可以应用于数据流环境。实验表明:XSLC算法的压缩比率和压缩时间均优于传统算法。
XML documents have been widely used as a data exchange format. XML (extensible markup language) data compression technology has become a new field of research. A compression method called XSLC (XML stream layered-coding compression) is proposed to compress and decompress XML stream in real time. When DTD (document type definition) is available, XSLC can analyze the data model and encode elements according to the relationship of father node and son node, compress data part according to its type, and support query operations applied on compressed files, as for only one time of scanning data is needed, all the processes can be implemented in XML data stream environment. Experimental results show that XSLC outperforms other methods in compression ratio and compression efficiency.
出处
《计算机科学与探索》
CSCD
2010年第2期145-152,共8页
Journal of Frontiers of Computer Science and Technology
基金
国家自然科学基金No.60673113
国家高技术研究发展计划(863)No.2007AA01Z191
2009AA01Z150
教育部科技创新工程重大项目培育资金项目No.708001~~
关键词
可扩展标记语言
压缩
文档类型定义
数据流
extensible markup language(XML)
compression
document type definition(DTD)
data stream