在Hadoop平台中采用索引文件来辅助查询是解决海量RDF(Resource Description Framework)查询的一种新思路。目前在Hadoop平台中实现的RDF查询都较少利用索引文件,且主要针对RDF的静态数据,对数据动态更新操作的兼容性都比较差。为了克...在Hadoop平台中采用索引文件来辅助查询是解决海量RDF(Resource Description Framework)查询的一种新思路。目前在Hadoop平台中实现的RDF查询都较少利用索引文件,且主要针对RDF的静态数据,对数据动态更新操作的兼容性都比较差。为了克服这两个缺点,提出IMSQ(using Index in MapReduce to Segment and Query)算法来对RDF文件进行分布式查询。该算法主要分为分割和查询两部分:首先为RDF进行一次星形分割,得到若干个分割,文件并建立索引文件;其次在查询时,按照分层生成连接计划,采用过滤选择策略,先找索引文件,缩小文件集,再对相应的分割文件进行查询;最后进行一次结果合并和输出。在LUBM数据集上进行的测试实验表明,在数据量大的情况下IMSQ方法的查询效率具有明显的优势。展开更多
Peer-to-peer technologies have emerged as a powerful and scalable communication model for large scale content shar-ing. However, they are not yet provided with optimized heterogeneous aggregated content management fun...Peer-to-peer technologies have emerged as a powerful and scalable communication model for large scale content shar-ing. However, they are not yet provided with optimized heterogeneous aggregated content management functionality since they lack rich semantic specifications. To overcome these shortcomings, we elaborated a reference model of P2P architecture for a dynamic aggregation, sharing and retrieval of heterogeneous multimedia contents (simple or aggre-gated). This architecture was mainly developed under the CAM4Home European research project and is fully based on the CAM4Home semantic metadata model. This semantic model relies on RDF (Resource Description Framework) and is rich (but simple enough), extensible and dedicated for the description of any kind of multimedia content.In this paper, we detail and evaluate an original semantic-based community network architecture for heterogeneous multimedia con-tent sharing and retrieval. Within the presentedarchitecture, multimedia contents are managed according to their asso-ciated CAM4Home semantic metadata through a structured P2P topology. This topology relies on a semantically en-hanced DHT (Distributed Hash Table) and is also provided with an additional indexing system for offering semantic storage and search facilities and overcoming the problem of exact match keywords in DHTs.展开更多
RDF is the data interchange layer for the Semantic Web. an RDF repository should provide not only the necessary scalability In order to manage the increasing amount of RDF data, and efficiency, but also sufficient inf...RDF is the data interchange layer for the Semantic Web. an RDF repository should provide not only the necessary scalability In order to manage the increasing amount of RDF data, and efficiency, but also sufficient inference capabilities. Though existing RDF repositories have made progress towards these goals, there is still ample space for improving the overall performance. In this paper, we propose a native RDF repository, System H, to pursue a better tradeoff among system scalability, query efficiency, and inference capabilities. System II takes a hypergraph representation for RDF as the data model for its persistent storage, which effectively avoids the costs of data model transformation when accessing RDF data. Based on this native storage scheme, a set of efficient semantic query processing techniques are designed. First, several indices are built to accelerate RDF data access including a value index, a labeling scheme for transitive closure computation, and three triple indices. Second, we propose a hybrid inference strategy under the pD* semantics to support inference for OWL-Lite with a relatively low computational complexity. Finally, we extend the SPARQL algebra to explicitly express inference semantics in logical query plan by defining some new algebra operators. In addition, MD5 hash value of URI and schema level cache are introduced as practical implementation techniques. The results of performance evaluation on the LUBM benchmark and a real data set show that System Ⅱ has a better combined metric value than other comparable systems.展开更多
文摘在Hadoop平台中采用索引文件来辅助查询是解决海量RDF(Resource Description Framework)查询的一种新思路。目前在Hadoop平台中实现的RDF查询都较少利用索引文件,且主要针对RDF的静态数据,对数据动态更新操作的兼容性都比较差。为了克服这两个缺点,提出IMSQ(using Index in MapReduce to Segment and Query)算法来对RDF文件进行分布式查询。该算法主要分为分割和查询两部分:首先为RDF进行一次星形分割,得到若干个分割,文件并建立索引文件;其次在查询时,按照分层生成连接计划,采用过滤选择策略,先找索引文件,缩小文件集,再对相应的分割文件进行查询;最后进行一次结果合并和输出。在LUBM数据集上进行的测试实验表明,在数据量大的情况下IMSQ方法的查询效率具有明显的优势。
文摘Peer-to-peer technologies have emerged as a powerful and scalable communication model for large scale content shar-ing. However, they are not yet provided with optimized heterogeneous aggregated content management functionality since they lack rich semantic specifications. To overcome these shortcomings, we elaborated a reference model of P2P architecture for a dynamic aggregation, sharing and retrieval of heterogeneous multimedia contents (simple or aggre-gated). This architecture was mainly developed under the CAM4Home European research project and is fully based on the CAM4Home semantic metadata model. This semantic model relies on RDF (Resource Description Framework) and is rich (but simple enough), extensible and dedicated for the description of any kind of multimedia content.In this paper, we detail and evaluate an original semantic-based community network architecture for heterogeneous multimedia con-tent sharing and retrieval. Within the presentedarchitecture, multimedia contents are managed according to their asso-ciated CAM4Home semantic metadata through a structured P2P topology. This topology relies on a semantically en-hanced DHT (Distributed Hash Table) and is also provided with an additional indexing system for offering semantic storage and search facilities and overcoming the problem of exact match keywords in DHTs.
基金supported by the National Natural Science Foundation of China under Grant Nos.90604025 and 60773106the National Basic Research 973 Program of China under Grant Nos.2003CB317007 and 2007CB310803
文摘RDF is the data interchange layer for the Semantic Web. an RDF repository should provide not only the necessary scalability In order to manage the increasing amount of RDF data, and efficiency, but also sufficient inference capabilities. Though existing RDF repositories have made progress towards these goals, there is still ample space for improving the overall performance. In this paper, we propose a native RDF repository, System H, to pursue a better tradeoff among system scalability, query efficiency, and inference capabilities. System II takes a hypergraph representation for RDF as the data model for its persistent storage, which effectively avoids the costs of data model transformation when accessing RDF data. Based on this native storage scheme, a set of efficient semantic query processing techniques are designed. First, several indices are built to accelerate RDF data access including a value index, a labeling scheme for transitive closure computation, and three triple indices. Second, we propose a hybrid inference strategy under the pD* semantics to support inference for OWL-Lite with a relatively low computational complexity. Finally, we extend the SPARQL algebra to explicitly express inference semantics in logical query plan by defining some new algebra operators. In addition, MD5 hash value of URI and schema level cache are introduced as practical implementation techniques. The results of performance evaluation on the LUBM benchmark and a real data set show that System Ⅱ has a better combined metric value than other comparable systems.