Resource description framework(RDF)stream is useful to model spatio-temporal data.In this paper,we propose a framework for large-scale RDF stream processing,LRSP,to process general continuous queries over large-scale ...Resource description framework(RDF)stream is useful to model spatio-temporal data.In this paper,we propose a framework for large-scale RDF stream processing,LRSP,to process general continuous queries over large-scale RDF streams.Firstly,we propose a formalization(named CT-SPARQL)to represent the general continuous queries in a unified,unambiguous way.Secondly,based on our formalization we propose LRSP to process continuous queries in a common white-box way by separating RDF stream processing,query parsing,and query execution.Finally,we implement and evaluate LRSP with those popular continuous query engines on some benchmark datasets and real-world datasets.Due to the architecture of LRSP,many efficient query engines(including centralized and distributed engines)for RDF can be directly employed to process continuous queries.The experimental results show that LRSP has a higher performance,specially,in processing large-scale real-world data.展开更多
Aiming at the problem that only some types of SPARQL ( simple protocal and resource description framework query language) queries can be answered by using the current resource description framework link traversal ba...Aiming at the problem that only some types of SPARQL ( simple protocal and resource description framework query language) queries can be answered by using the current resource description framework link traversal based query execution (RDF-LTE) approach, this paper discusses how the execution order of the triple pattern affects the query results and cost based on concrete SPARQL queries, and analyzes two properties of the web of linked data, missing backward links and missing contingency solution. Then three heuristic principles for logic query plan optimization, namely, the filtered basic graph pattern (FBGP) principle, the triple pattern chain principle and the seed URIs principle, are proposed. The three principles contribute to decrease the intermediate solutions and increase the types of queries that can be answered. The effectiveness and feasibility of the proposed approach is evaluated. The experimental results show that more query results can be returned with less cost, thus enabling users to develop the full potential of the web of linked data.展开更多
基金the National Key Research and Development Program of China under Grant No.2017YFC0908401the National Natural Science Foundation of China under Grant No.61672377the program of Peiyang Young Scholars of China under Grant No.2019XRX-0032.
文摘Resource description framework(RDF)stream is useful to model spatio-temporal data.In this paper,we propose a framework for large-scale RDF stream processing,LRSP,to process general continuous queries over large-scale RDF streams.Firstly,we propose a formalization(named CT-SPARQL)to represent the general continuous queries in a unified,unambiguous way.Secondly,based on our formalization we propose LRSP to process continuous queries in a common white-box way by separating RDF stream processing,query parsing,and query execution.Finally,we implement and evaluate LRSP with those popular continuous query engines on some benchmark datasets and real-world datasets.Due to the architecture of LRSP,many efficient query engines(including centralized and distributed engines)for RDF can be directly employed to process continuous queries.The experimental results show that LRSP has a higher performance,specially,in processing large-scale real-world data.
基金国家自然科学基金(the National Natural Science Foundation of China under Grant No.60773100)教育部科学技术研究重点项目(theKey Scientific and Technical Research Project of Ministry of Education of China under Grant No.205014)河北省教育厅科研计划项目(the Science Research Plan of the Office of Education of Hebei under Grant No.2006143)
基金Supported by the National High-Tech Research and Development Plan of China under Grant No.2004AA112010 (国家高技术研究发展计划(863))the National Basic Research Program of China under Grant No.2002CB312005 (国家重点基础研究发展计划(973))
基金The National Natural Science Foundation of China(No.61070170)the Natural Science Foundation of Higher Education Institutions of Jiangsu Province(No.11KJB520017)Suzhou Application Foundation Research Project(No.SYG201238)
文摘Aiming at the problem that only some types of SPARQL ( simple protocal and resource description framework query language) queries can be answered by using the current resource description framework link traversal based query execution (RDF-LTE) approach, this paper discusses how the execution order of the triple pattern affects the query results and cost based on concrete SPARQL queries, and analyzes two properties of the web of linked data, missing backward links and missing contingency solution. Then three heuristic principles for logic query plan optimization, namely, the filtered basic graph pattern (FBGP) principle, the triple pattern chain principle and the seed URIs principle, are proposed. The three principles contribute to decrease the intermediate solutions and increase the types of queries that can be answered. The effectiveness and feasibility of the proposed approach is evaluated. The experimental results show that more query results can be returned with less cost, thus enabling users to develop the full potential of the web of linked data.