摘要
基于术语集规模小、内容稳定、入度高、访问频率高、具有整体性的特性,提出了术语集冗余预处理.在此之上设计了符合海量资源描述框架(RDF)数据集特性的划分框架.与经典多层次图划分算法的对比实验结果表明:引入术语集冗余和边权重因子的划分框架适用于海量RDF划分任务,能够以较小的冗余开销有效地降低边切分,从而为上层计算提供了良好的数据分布管理基础.
Special properties such as small size,stable content,high in-degree,high access frequency, and integrity of terminology box were analyzed.Proposes terminology box replication preprocessing, and then design a partition framework according to properties of the web scaled resource description framework(RDF)data.In comparison with the classical multilevel graph partitioning algorithm,experiment results show that the partition framework is good for web scaled RDF data partition,dramatically decrease the edge cut by small redundancy.This partition framework provides a good data distribution management for high level computing.
出处
《华中科技大学学报(自然科学版)》
EI
CAS
CSCD
北大核心
2013年第S2期42-47,共6页
Journal of Huazhong University of Science and Technology(Natural Science Edition)
基金
国家高技术研究发展计划资助项目(2013AA013204)
国家自然科学基金资助项目(61373165)
关键词
资源描述框架
图划分
无标度
术语集
数据冗余
resource description framework(RDF)
graph partition
scale free
terminology box
data redundancy