摘要
语义网技术的发展使资源描述框架(RDF)的数据量迅速增长,导致其对存储空间与传输带宽的要求不断提高。现有的通用压缩方法和RDF专用压缩方法可以解决该问题,但仍存在数据冗余。为此,提出一种基于差分编码的RDF分组压缩算法。将RDF数据根据连接宾语的谓语组合进行分组,在消除宾语冗余的同时进一步减少谓语冗余。在此基础上,针对分组后得到的主语序列,通过引入差分编码技术进一步优化其存储空间。实验结果显示,与Plain、HDT和HDT++算法相比,该算法在结构化程度低的Archives Hub、Linkedmdb、rdfabout和DBpedia数据集中可获得平均17%的性能提升,在结构化程度高的dbtune数据集中可获得23%的性能提升,表明其对于不同结构化程度的数据集均具有较好的RDF压缩性能。
With the development of semantic Web technology,the volume of Resource Description Framework(RDF)data is increasing rapidly along with its demand for storage space and transmission bandwidth.Existing general compression methods and RDF-specific compression methods can solve this problem,but still suffer from a lack of data redundancy.To this end,this paper proposes an RDF grouping compression algorithm based on delta encoding.The algorithm groups RDF data according to the combination of predicates connected to the object,so as to further reduce predicate redundancy while eliminating object redundancy.On this basis,it further optimizes the storage space of the grouped subject sequence data by introducing delta coding technology.Experimental results show that,compared with the Plain,HDT and HDT++algorithm,this algorithm improves the performance by 17%on average in less structured datasets including Archives Hub,Linkedmdb,rdfabout and DBpedia,meanwhile improves performance by 23%on average in highly structured dataset dbtune,which demonstrates that the proposed algorithm has better RDF compression performance in datasets with different degrees of structure.
作者
伍伟鑫
韩京宇
朱曼
WU Weixin;HAN Jingyu;ZHU Man(School of Computer Science,Nanjing University of Posts and Telecommunications,Nanjing 210023,China)
出处
《计算机工程》
CAS
CSCD
北大核心
2020年第11期117-123,共7页
Computer Engineering
基金
国家自然科学基金(61602260)
江苏省社科基金重点项目(18GLA004)。
关键词
语义网
资源描述框架
结构化程度
数据压缩
差分编码
sematic Web
Resource Description Framework(RDF)
degree of structure
data compression
delta encoding