摘要
如何有效管理并利用日益庞大的RDF数据是当今Web数据管理领域面临的挑战之一。对大规模的RDF数据集进行聚类操作从而得到数据集的有效划分是RDF数据存储和应用时通常采取的策略。针对现有RDF聚类过程中忽略RDF三元组自身模式特征的问题,在对RDF聚类结果的形式深入分析的基础上,定义了3种不同类型的聚类模式,从而提出基于模式的聚类方法。通过对RDF数据集的重新描述,自动生成适用于RDF数据集特征的聚类模式,在此基础上实现数据聚类的任务。在不同测试集上的实验结果验证了所提方法的正确性和有效性。
How to manage and exploit the large mount of RDF dataset availably has become a vital issue in Web data management field. In order to partition the large scale RDF dataset for efficient data processing, clustering is usually adopted. The related researches tend to use classical clustering methods, and neglect the structure features of RDF tri- ples. This paper analyzed the RDF clustering results intensively, and defined three types of cluster patterns. Based on the cluster patterns,a novel RDF data clustering strategy was proposed. By redescribing the RDF dataset, the cluster patterns can be generated automatically. The experiments on different test benches prove the accuracy and efficiency of the new method.
出处
《计算机科学》
CSCD
北大核心
2015年第10期266-270,296,共6页
Computer Science
基金
国家自然科学基金项目:云计算环境下旅游信息个性化服务模型研究(41271387)资助
关键词
聚类
开放关联数据
聚类模式
RDF, Clustering, Linked open data, Clustering pattern