Purpose:The interdisciplinary nature and rapid development of the Semantic Web led to the mass publication of RDF data in a large number of widely accepted serialization formats,thus developing out the necessity for R...Purpose:The interdisciplinary nature and rapid development of the Semantic Web led to the mass publication of RDF data in a large number of widely accepted serialization formats,thus developing out the necessity for RDF data processing with specific purposes.The paper reports on an assessment of chief RDF data endpoint challenges and introduces the RDF Adaptor,a set of plugins for RDF data processing which covers the whole life-cycle with high efficiency.Design/methodology/approach:The RDFAdaptor is designed based on the prominent ETL tool—Pentaho Data Integration—which provides a user-friendly and intuitive interface and allows connect to various data sources and formats,and reuses the Java framework RDF4J as middleware that realizes access to data repositories,SPARQL endpoints and all leading RDF database solutions with SPARQL 1.1 support.It can support effortless services with various configuration templates in multi-scenario applications,and help extend data process tasks in other services or tools to complement missing functions.Findings:The proposed comprehensive RDF ETL solution—RDFAdaptor—provides an easy-to-use and intuitive interface,supports data integration and federation over multi-source heterogeneous repositories or endpoints,as well as manage linked data in hybrid storage mode.Research limitations:The plugin set can support several application scenarios of RDF data process,but error detection/check and interaction with other graph repositories remain to be improved.Practical implications:The plugin set can provide user interface and configuration templates which enable its usability in various applications of RDF data generation,multi-format data conversion,remote RDF data migration,and RDF graph update in semantic query process.Originality/value:This is the first attempt to develop components instead of systems that can include extract,consolidate,and store RDF data on the basis of an ecologically mature data warehousing environment.展开更多
With the rapid growth of the linked data on the Web,the quality assessment of the RDF data set becomes particularly important,especially for the quality and accessibility of the linked data.In most cases,RDF data sets...With the rapid growth of the linked data on the Web,the quality assessment of the RDF data set becomes particularly important,especially for the quality and accessibility of the linked data.In most cases,RDF data sets are shared online,leading to a high maintenance cost for the quality assessment.This also potentially pollutes Internet data.Recently blockchain technology has shown the potential in many applications.Using the blockchain storage quality assessment results can reduce the centralization of the authority,and the quality assessment results have characteristics such as non-tampering.To this end,we propose an RDF data quality assessment model in a decentralized environment,pointing out a new dimension of RDF data quality.We use the blockchain to record the data quality assessment results and design a detailed update strategy for the quality assessment results.We have implemented a system DCQA to test and verify the feasibility of the quality assessment model.The proposed method can provide users with better cost-effective results when knowledge is independently protected.展开更多
RDF is the data interchange layer for the Semantic Web. an RDF repository should provide not only the necessary scalability In order to manage the increasing amount of RDF data, and efficiency, but also sufficient inf...RDF is the data interchange layer for the Semantic Web. an RDF repository should provide not only the necessary scalability In order to manage the increasing amount of RDF data, and efficiency, but also sufficient inference capabilities. Though existing RDF repositories have made progress towards these goals, there is still ample space for improving the overall performance. In this paper, we propose a native RDF repository, System H, to pursue a better tradeoff among system scalability, query efficiency, and inference capabilities. System II takes a hypergraph representation for RDF as the data model for its persistent storage, which effectively avoids the costs of data model transformation when accessing RDF data. Based on this native storage scheme, a set of efficient semantic query processing techniques are designed. First, several indices are built to accelerate RDF data access including a value index, a labeling scheme for transitive closure computation, and three triple indices. Second, we propose a hybrid inference strategy under the pD* semantics to support inference for OWL-Lite with a relatively low computational complexity. Finally, we extend the SPARQL algebra to explicitly express inference semantics in logical query plan by defining some new algebra operators. In addition, MD5 hash value of URI and schema level cache are introduced as practical implementation techniques. The results of performance evaluation on the LUBM benchmark and a real data set show that System Ⅱ has a better combined metric value than other comparable systems.展开更多
RDF is increasingly being used to encode data for the semantic web and data exchange. There have been a large number of works that address RDF data manage- ment following different approaches. In this paper we pro- vi...RDF is increasingly being used to encode data for the semantic web and data exchange. There have been a large number of works that address RDF data manage- ment following different approaches. In this paper we pro- vide an overview of these works. This review considers cen- tralized solutions (what are referred to as warehousing ap- proaches), distributed solutions, and the techniques that have been developed for querying linked data. In each category, further classifications are provided that would assist readers in understanding the identifying characteristics of different approaches.展开更多
语义网数据的大量增加使得RDF数据查询成为一个重要研究主题.关键词查询方式不需要掌握数据模式或查询语言,更适合普通用户使用.文中提出一种RDF数据关键词查询方法KREAG(Keyword query over RDF data based on Entity-triple Associati...语义网数据的大量增加使得RDF数据查询成为一个重要研究主题.关键词查询方式不需要掌握数据模式或查询语言,更适合普通用户使用.文中提出一种RDF数据关键词查询方法KREAG(Keyword query over RDF data based on Entity-triple Association Graph).为了支持用户对属性或关系名进行查询,将RDF数据建模为顶点带标签的实体三元组关联图.该模型保证了RDF数据中实体间关联转化为关联图中顶点间的通路,且文本信息全部封装到关联图顶点标签上.在此基础上,将关键词查询问题转化为关联图上查找有向斯坦纳树问题.在保证近似比为m的前提下(m为查询关键词的个数),利用近似算法实现快速查询响应.通过合理的评分方式衡量查询结果的相关性,支持top-k查询.算法的时间复杂度为O(m.|V|),其中|V|为实体三元组关联图中顶点个数.实验表明KREAG较其它方法具有更快的响应时间,同时能够有效地实现RDF数据的关键词查询.展开更多
基金This work is supported by“National Social Science Foundation in China”Project(19BTQ061)“Integration and Development on A Next Generation of Open Knowledge Services System and Key Technologies”project(2020XM05).
文摘Purpose:The interdisciplinary nature and rapid development of the Semantic Web led to the mass publication of RDF data in a large number of widely accepted serialization formats,thus developing out the necessity for RDF data processing with specific purposes.The paper reports on an assessment of chief RDF data endpoint challenges and introduces the RDF Adaptor,a set of plugins for RDF data processing which covers the whole life-cycle with high efficiency.Design/methodology/approach:The RDFAdaptor is designed based on the prominent ETL tool—Pentaho Data Integration—which provides a user-friendly and intuitive interface and allows connect to various data sources and formats,and reuses the Java framework RDF4J as middleware that realizes access to data repositories,SPARQL endpoints and all leading RDF database solutions with SPARQL 1.1 support.It can support effortless services with various configuration templates in multi-scenario applications,and help extend data process tasks in other services or tools to complement missing functions.Findings:The proposed comprehensive RDF ETL solution—RDFAdaptor—provides an easy-to-use and intuitive interface,supports data integration and federation over multi-source heterogeneous repositories or endpoints,as well as manage linked data in hybrid storage mode.Research limitations:The plugin set can support several application scenarios of RDF data process,but error detection/check and interaction with other graph repositories remain to be improved.Practical implications:The plugin set can provide user interface and configuration templates which enable its usability in various applications of RDF data generation,multi-format data conversion,remote RDF data migration,and RDF graph update in semantic query process.Originality/value:This is the first attempt to develop components instead of systems that can include extract,consolidate,and store RDF data on the basis of an ecologically mature data warehousing environment.
基金This work was supported by the National Natural Science Foundation of China(Grant No:U1836118,Grant No:61602350)the Key Projects of National Social Science Foundation of China(Grant No:11&ZD189)and the Scientific Research Project of Education Department of Hubei Province(Grant No:B2019008).
文摘With the rapid growth of the linked data on the Web,the quality assessment of the RDF data set becomes particularly important,especially for the quality and accessibility of the linked data.In most cases,RDF data sets are shared online,leading to a high maintenance cost for the quality assessment.This also potentially pollutes Internet data.Recently blockchain technology has shown the potential in many applications.Using the blockchain storage quality assessment results can reduce the centralization of the authority,and the quality assessment results have characteristics such as non-tampering.To this end,we propose an RDF data quality assessment model in a decentralized environment,pointing out a new dimension of RDF data quality.We use the blockchain to record the data quality assessment results and design a detailed update strategy for the quality assessment results.We have implemented a system DCQA to test and verify the feasibility of the quality assessment model.The proposed method can provide users with better cost-effective results when knowledge is independently protected.
基金supported by the National Natural Science Foundation of China under Grant Nos.90604025 and 60773106the National Basic Research 973 Program of China under Grant Nos.2003CB317007 and 2007CB310803
文摘RDF is the data interchange layer for the Semantic Web. an RDF repository should provide not only the necessary scalability In order to manage the increasing amount of RDF data, and efficiency, but also sufficient inference capabilities. Though existing RDF repositories have made progress towards these goals, there is still ample space for improving the overall performance. In this paper, we propose a native RDF repository, System H, to pursue a better tradeoff among system scalability, query efficiency, and inference capabilities. System II takes a hypergraph representation for RDF as the data model for its persistent storage, which effectively avoids the costs of data model transformation when accessing RDF data. Based on this native storage scheme, a set of efficient semantic query processing techniques are designed. First, several indices are built to accelerate RDF data access including a value index, a labeling scheme for transitive closure computation, and three triple indices. Second, we propose a hybrid inference strategy under the pD* semantics to support inference for OWL-Lite with a relatively low computational complexity. Finally, we extend the SPARQL algebra to explicitly express inference semantics in logical query plan by defining some new algebra operators. In addition, MD5 hash value of URI and schema level cache are introduced as practical implementation techniques. The results of performance evaluation on the LUBM benchmark and a real data set show that System Ⅱ has a better combined metric value than other comparable systems.
文摘RDF is increasingly being used to encode data for the semantic web and data exchange. There have been a large number of works that address RDF data manage- ment following different approaches. In this paper we pro- vide an overview of these works. This review considers cen- tralized solutions (what are referred to as warehousing ap- proaches), distributed solutions, and the techniques that have been developed for querying linked data. In each category, further classifications are provided that would assist readers in understanding the identifying characteristics of different approaches.
文摘语义网数据的大量增加使得RDF数据查询成为一个重要研究主题.关键词查询方式不需要掌握数据模式或查询语言,更适合普通用户使用.文中提出一种RDF数据关键词查询方法KREAG(Keyword query over RDF data based on Entity-triple Association Graph).为了支持用户对属性或关系名进行查询,将RDF数据建模为顶点带标签的实体三元组关联图.该模型保证了RDF数据中实体间关联转化为关联图中顶点间的通路,且文本信息全部封装到关联图顶点标签上.在此基础上,将关键词查询问题转化为关联图上查找有向斯坦纳树问题.在保证近似比为m的前提下(m为查询关键词的个数),利用近似算法实现快速查询响应.通过合理的评分方式衡量查询结果的相关性,支持top-k查询.算法的时间复杂度为O(m.|V|),其中|V|为实体三元组关联图中顶点个数.实验表明KREAG较其它方法具有更快的响应时间,同时能够有效地实现RDF数据的关键词查询.