To solve the query processing correctness problem for semantic-based relational data integration,the semantics of SAPRQL(simple protocol and RDF query language) queries is defined.In the course of query rewriting,al...To solve the query processing correctness problem for semantic-based relational data integration,the semantics of SAPRQL(simple protocol and RDF query language) queries is defined.In the course of query rewriting,all relative tables are found and decomposed into minimal connectable units.Minimal connectable units are joined according to semantic queries to produce the semantically correct query plans.Algorithms for query rewriting and transforming are presented.Computational complexity of the algorithms is discussed.Under the worst case,the query decomposing algorithm can be finished in O(n2) time and the query rewriting algorithm requires O(nm) time.And the performance of the algorithms is verified by experiments,and experimental results show that when the length of query is less than 8,the query processing algorithms can provide satisfactory performance.展开更多
Efficient support for querying large-scale resource description framework (RDF) triples plays an important role in semantic web data management. This paper presents an efficient RDF query engine to evaluate SPARQL q...Efficient support for querying large-scale resource description framework (RDF) triples plays an important role in semantic web data management. This paper presents an efficient RDF query engine to evaluate SPARQL queries, where the inverted index structure is employed for indexing the RDF triples. A set of operators on the inverted index was developed for query optimization and evaluation. Then a main-tree-shaped optimization algorithm was developed that transforms a SPARQL query graph into the op-timal query plan by effectively reducing the search space to determine the optimal joining order. The opti-mization collects a set of RDF statistics for estimating the execution cost of the query plan. Finally the opti-mal query plan is evaluated using the defined operators for answering the given SPARQL query. Extensive tests were conducted on both synthetic and real datasets containing up to 100 million triples to evaluate this approach with the results showing that this approach can answer most queries within 1 s and is extremely efficient and scalable in comparison with previous best state-of-the-art RDF stores.展开更多
Performance and scalability are two issues that are becoming increasingly pressing as the resource descrip- tion framework (RDF) data model is applied to real-world ap- plications. Because neither vertical nor flat ...Performance and scalability are two issues that are becoming increasingly pressing as the resource descrip- tion framework (RDF) data model is applied to real-world ap- plications. Because neither vertical nor flat structures of RDF storage can handle frequent schema updates and meanwhile avoid possible long-chain joins, there is no clear winner be- tween the two typical structures. In this paper, we propose an alternative open user schema. The open user schema con- sists of flat tables automatically extracted from RDF query streams. A query is divided into two parts and conquered on the fiat tables in the open user schema and on the vertical ta- ble stored in a backend storage. At the core of this divide and conquer architecture with open user schema, an efficient iso- morphic decision algorithm is introduced to guide a query to related flat tables in the open user schema. Our proposal in essence departs from existing methods in that it can accom- modate schema updates without possible long-chain joins. We implement our approach and provide empirical evalua- tions to demonstrate both the efficiency and effectiveness of our approach in evaluating complex RDF queries.展开更多
基金Weaponry Equipment Pre-Research Foundation of PLA Equipment Ministry (No. 9140A06050409JB8102)Pre-Research Foundation of PLA University of Science and Technology (No. 2009JSJ11)
文摘To solve the query processing correctness problem for semantic-based relational data integration,the semantics of SAPRQL(simple protocol and RDF query language) queries is defined.In the course of query rewriting,all relative tables are found and decomposed into minimal connectable units.Minimal connectable units are joined according to semantic queries to produce the semantically correct query plans.Algorithms for query rewriting and transforming are presented.Computational complexity of the algorithms is discussed.Under the worst case,the query decomposing algorithm can be finished in O(n2) time and the query rewriting algorithm requires O(nm) time.And the performance of the algorithms is verified by experiments,and experimental results show that when the length of query is less than 8,the query processing algorithms can provide satisfactory performance.
基金Supported by the Shanghai Jiao Tong University and IBM CRL Joint Project
文摘Efficient support for querying large-scale resource description framework (RDF) triples plays an important role in semantic web data management. This paper presents an efficient RDF query engine to evaluate SPARQL queries, where the inverted index structure is employed for indexing the RDF triples. A set of operators on the inverted index was developed for query optimization and evaluation. Then a main-tree-shaped optimization algorithm was developed that transforms a SPARQL query graph into the op-timal query plan by effectively reducing the search space to determine the optimal joining order. The opti-mization collects a set of RDF statistics for estimating the execution cost of the query plan. Finally the opti-mal query plan is evaluated using the defined operators for answering the given SPARQL query. Extensive tests were conducted on both synthetic and real datasets containing up to 100 million triples to evaluate this approach with the results showing that this approach can answer most queries within 1 s and is extremely efficient and scalable in comparison with previous best state-of-the-art RDF stores.
文摘Performance and scalability are two issues that are becoming increasingly pressing as the resource descrip- tion framework (RDF) data model is applied to real-world ap- plications. Because neither vertical nor flat structures of RDF storage can handle frequent schema updates and meanwhile avoid possible long-chain joins, there is no clear winner be- tween the two typical structures. In this paper, we propose an alternative open user schema. The open user schema con- sists of flat tables automatically extracted from RDF query streams. A query is divided into two parts and conquered on the fiat tables in the open user schema and on the vertical ta- ble stored in a backend storage. At the core of this divide and conquer architecture with open user schema, an efficient iso- morphic decision algorithm is introduced to guide a query to related flat tables in the open user schema. Our proposal in essence departs from existing methods in that it can accom- modate schema updates without possible long-chain joins. We implement our approach and provide empirical evalua- tions to demonstrate both the efficiency and effectiveness of our approach in evaluating complex RDF queries.