To solve the query processing correctness problem for semantic-based relational data integration,the semantics of SAPRQL(simple protocol and RDF query language) queries is defined.In the course of query rewriting,al...To solve the query processing correctness problem for semantic-based relational data integration,the semantics of SAPRQL(simple protocol and RDF query language) queries is defined.In the course of query rewriting,all relative tables are found and decomposed into minimal connectable units.Minimal connectable units are joined according to semantic queries to produce the semantically correct query plans.Algorithms for query rewriting and transforming are presented.Computational complexity of the algorithms is discussed.Under the worst case,the query decomposing algorithm can be finished in O(n2) time and the query rewriting algorithm requires O(nm) time.And the performance of the algorithms is verified by experiments,and experimental results show that when the length of query is less than 8,the query processing algorithms can provide satisfactory performance.展开更多
Deep web data integration needs to do schema matching on web query interfaces and obtain the mapping table.By introducing semantic conflicts into web query interface integration and discussing the origins and categori...Deep web data integration needs to do schema matching on web query interfaces and obtain the mapping table.By introducing semantic conflicts into web query interface integration and discussing the origins and categories of the semantic conflicts,an ontology-based schema matching method is proposed.The process of the method is explained in detail using the example of web query interface integration in house domain.Conflicts can be detected automatically by checking semantic relevance degree,then the categories of the conflicts are identified and messages are sent to the conflict solver,which eliminates the conflicts and obtains the mapping table using conflict solving rules.The proposed method is simple,easy to implement and can be flexibly reused by extending the ontology to different domains.展开更多
To facilitate users to access the desired information, many researches have dedicated to the Deep Web (i.e. Web databases) integration. We focus on query translation which is an important part of the Deep Web integr...To facilitate users to access the desired information, many researches have dedicated to the Deep Web (i.e. Web databases) integration. We focus on query translation which is an important part of the Deep Web integration. Our aim is to construct automatically a set of constraints mapping rules so that the system can translate the query from the integrated interface to the Web database interfaces based on them. We construct a concept hierarchy for the attributes of the query interfaces, especially, store the synonyms and the types (e.g. Number, Text, etc.) for every concept At the same time, we construct the data hierarchies for some concepts if necessary. Then we present an algorithm to generate the constraint mapping rules based on these hierarchies. The approach is suitable for the scalability of such application and can be extended easily from one domain to another for its domain independent feature. The results of experiment show its effectiveness and efficiency.展开更多
In this paper,we propose a new relational schema (R-schema) to XML schema translation algorithm, VQT, which analyzes the value cardinality and user query patterns and extracts the implicit referential integrities by u...In this paper,we propose a new relational schema (R-schema) to XML schema translation algorithm, VQT, which analyzes the value cardinality and user query patterns and extracts the implicit referential integrities by using the cardinality property of foreign key constraints between columns and the equi-join characteristic in user queries. The VQT algorithm can apply the extracted implied referential integrity relation information to the R-schema and create an XML schema as the final result. Therefore, the VQT algorithm prevents the R-schema from being incorrectly converted into the XML schema, and it richly and powerfully represents all the information in the R-schema by creating an XML schema as the translation result on behalf of the XML DTD.展开更多
Biomedical questions are usually complex and regard several different life science aspects. Numerous valuable and he- terogeneous data are increasingly available to answer such questions. Yet, they are dispersedly sto...Biomedical questions are usually complex and regard several different life science aspects. Numerous valuable and he- terogeneous data are increasingly available to answer such questions. Yet, they are dispersedly stored and difficult to be queried comprehensively. We created a Genomic and Proteomic Data Warehouse (GPDW) that integrates data provided by some of the main bioinformatics databases. It adopts a modular integrated data schema and several metadata to describe the integrated data, their sources and their location in the GPDW. Here, we present the Web application that we developed to enable any user to easily compose queries, although complex, on all data integrated in the GPDW. It is publicly available at http://www.bioinformatics.dei.polimi.it/GPKB/. Through a visual interface, the user is only required to select the types of data to be included in the query and the conditions on their values to be retrieved. Then, the Web application leverages the metadata and modular schema of the GPDW to automatically compose an efficient SQL query, run it on the GPDW and show the extracted requested data, enriched with links to external data sources. Performed tests demonstrated efficiency and usability of the developed Web application, and showed its and GPDW relevance in supporting answering biomedical questions, also difficult.展开更多
Cleaning duplicate data is a major problem that persists even though many works have been done to solve it, due to the exponential growth of data amount treated and the necessity to use scalable and speed algorithms. ...Cleaning duplicate data is a major problem that persists even though many works have been done to solve it, due to the exponential growth of data amount treated and the necessity to use scalable and speed algorithms. This problem depends on the type and quality of data, and differs according to the volume of data set manipulated. In this paper we are going to introduce a novel framework based on extended fuzzy C-means algorithm by using topic ontology. This work aims to improve the OLAP querying process over heterogeneous data warehouses that contain big data sets, by improving query results integration, eliminating redundancies by using the extended classification algorithm, and measuring the loss of information.展开更多
Tiered Mobile Wireless Sensor Network(TMWSN)is a new paradigm introduced by mobile edge computing.Now it has received wide attention because of its high scalability,robustness,deployment flexibility,and it has a wide ...Tiered Mobile Wireless Sensor Network(TMWSN)is a new paradigm introduced by mobile edge computing.Now it has received wide attention because of its high scalability,robustness,deployment flexibility,and it has a wide range of application scenarios.In TMWSNs,the storage nodes are the key nodes of the network and are more easily captured and utilized by attackers.Once the storage nodes are captured by the attackers,the data stored on them will be exposed.Moreover,the query process and results will not be trusted any more.This paper mainly studies the secure KNN query technology in TMWSNs,and we propose a secure KNN query algorithm named the Basic Algorithm For Secure KNN Query(BAFSKQ)first,which can protect privacy and verify the integrity of query results.However,this algorithm has a large communication overhead in most cases.In order to solve this problem,we propose an improved algorithm named the Secure KNN Query Algorithm Based on MR-Tree(SEKQAM).The MR-Trees are used to find the K-nearest locations and help to generate a verification set to process the verification of query results.It can be proved that our algorithms can effectively guarantee the privacy of the data stored on the storage nodes and the integrity of the query results.Our experimental results also show that after introducing the MR-Trees in KNN queries on TMWSNs,the communication overhead has an effective reduction compared to BAFSKQ.展开更多
With the rapid development of information technology,semantic web data present features of massiveness and complexity.As the data-centric science,social computing have great influence in collecting and analyzing seman...With the rapid development of information technology,semantic web data present features of massiveness and complexity.As the data-centric science,social computing have great influence in collecting and analyzing semantic data.In our contribution,we propose an integrity constraint validation for DL-LiteR based ontology in view of data correctness issue in the progress of social computing applications.Firstly,at the basis of translations from integrity constraint axioms into a set of conjunctive queries,integrity constraint validation is converted into the conjunctive query answering over knowledge bases.Moreover,rewriting rules are used for reformulating the integrity constraint axioms using standard axioms.On this account,the integrity constraint validation can be reduced to the query evaluation over the ABox,and use query mechanisms in database management systems to optimize integrity constraint validation.Finally,the experimental result shows that the rewritingbased method greatly improves the efficiency of integrity constraint validation and is more appropriate to scalable data in the semantic web.展开更多
基金Weaponry Equipment Pre-Research Foundation of PLA Equipment Ministry (No. 9140A06050409JB8102)Pre-Research Foundation of PLA University of Science and Technology (No. 2009JSJ11)
文摘To solve the query processing correctness problem for semantic-based relational data integration,the semantics of SAPRQL(simple protocol and RDF query language) queries is defined.In the course of query rewriting,all relative tables are found and decomposed into minimal connectable units.Minimal connectable units are joined according to semantic queries to produce the semantically correct query plans.Algorithms for query rewriting and transforming are presented.Computational complexity of the algorithms is discussed.Under the worst case,the query decomposing algorithm can be finished in O(n2) time and the query rewriting algorithm requires O(nm) time.And the performance of the algorithms is verified by experiments,and experimental results show that when the length of query is less than 8,the query processing algorithms can provide satisfactory performance.
基金The National Natural Science Foundation of China(No.60673130)the Natural Science Foundation of Shandong Province(No.Y2006G29,Y2007G24,Y2007G38)the Encouragement Fund for Young Scholars of Shandong Province(No.2005BS01002)
文摘Deep web data integration needs to do schema matching on web query interfaces and obtain the mapping table.By introducing semantic conflicts into web query interface integration and discussing the origins and categories of the semantic conflicts,an ontology-based schema matching method is proposed.The process of the method is explained in detail using the example of web query interface integration in house domain.Conflicts can be detected automatically by checking semantic relevance degree,then the categories of the conflicts are identified and messages are sent to the conflict solver,which eliminates the conflicts and obtains the mapping table using conflict solving rules.The proposed method is simple,easy to implement and can be flexibly reused by extending the ontology to different domains.
基金Supported by the National Natural Science Foundation of China (60573091)the Natural Science Foundation of Beijing(4073035)the Key Project of Ministry of Education of China (03044)
文摘To facilitate users to access the desired information, many researches have dedicated to the Deep Web (i.e. Web databases) integration. We focus on query translation which is an important part of the Deep Web integration. Our aim is to construct automatically a set of constraints mapping rules so that the system can translate the query from the integrated interface to the Web database interfaces based on them. We construct a concept hierarchy for the attributes of the query interfaces, especially, store the synonyms and the types (e.g. Number, Text, etc.) for every concept At the same time, we construct the data hierarchies for some concepts if necessary. Then we present an algorithm to generate the constraint mapping rules based on these hierarchies. The approach is suitable for the scalability of such application and can be extended easily from one domain to another for its domain independent feature. The results of experiment show its effectiveness and efficiency.
基金Project supported by the 2nd Brain Korea Project
文摘In this paper,we propose a new relational schema (R-schema) to XML schema translation algorithm, VQT, which analyzes the value cardinality and user query patterns and extracts the implicit referential integrities by using the cardinality property of foreign key constraints between columns and the equi-join characteristic in user queries. The VQT algorithm can apply the extracted implied referential integrity relation information to the R-schema and create an XML schema as the final result. Therefore, the VQT algorithm prevents the R-schema from being incorrectly converted into the XML schema, and it richly and powerfully represents all the information in the R-schema by creating an XML schema as the translation result on behalf of the XML DTD.
文摘Biomedical questions are usually complex and regard several different life science aspects. Numerous valuable and he- terogeneous data are increasingly available to answer such questions. Yet, they are dispersedly stored and difficult to be queried comprehensively. We created a Genomic and Proteomic Data Warehouse (GPDW) that integrates data provided by some of the main bioinformatics databases. It adopts a modular integrated data schema and several metadata to describe the integrated data, their sources and their location in the GPDW. Here, we present the Web application that we developed to enable any user to easily compose queries, although complex, on all data integrated in the GPDW. It is publicly available at http://www.bioinformatics.dei.polimi.it/GPKB/. Through a visual interface, the user is only required to select the types of data to be included in the query and the conditions on their values to be retrieved. Then, the Web application leverages the metadata and modular schema of the GPDW to automatically compose an efficient SQL query, run it on the GPDW and show the extracted requested data, enriched with links to external data sources. Performed tests demonstrated efficiency and usability of the developed Web application, and showed its and GPDW relevance in supporting answering biomedical questions, also difficult.
文摘Cleaning duplicate data is a major problem that persists even though many works have been done to solve it, due to the exponential growth of data amount treated and the necessity to use scalable and speed algorithms. This problem depends on the type and quality of data, and differs according to the volume of data set manipulated. In this paper we are going to introduce a novel framework based on extended fuzzy C-means algorithm by using topic ontology. This work aims to improve the OLAP querying process over heterogeneous data warehouses that contain big data sets, by improving query results integration, eliminating redundancies by using the extended classification algorithm, and measuring the loss of information.
基金This work is supported by the Aeronautical Science Foundation of China under Grant 20165515001the National Natural Science Foundation of China under Grant No.61402225State Key Laboratory for smart grid protection and operation control Foundation,and the Science and Technology Funds from National State Grid Ltd.(The Research on Key Technologies of Distributed Parallel Database Storage and Processing based on Big Data).
文摘Tiered Mobile Wireless Sensor Network(TMWSN)is a new paradigm introduced by mobile edge computing.Now it has received wide attention because of its high scalability,robustness,deployment flexibility,and it has a wide range of application scenarios.In TMWSNs,the storage nodes are the key nodes of the network and are more easily captured and utilized by attackers.Once the storage nodes are captured by the attackers,the data stored on them will be exposed.Moreover,the query process and results will not be trusted any more.This paper mainly studies the secure KNN query technology in TMWSNs,and we propose a secure KNN query algorithm named the Basic Algorithm For Secure KNN Query(BAFSKQ)first,which can protect privacy and verify the integrity of query results.However,this algorithm has a large communication overhead in most cases.In order to solve this problem,we propose an improved algorithm named the Secure KNN Query Algorithm Based on MR-Tree(SEKQAM).The MR-Trees are used to find the K-nearest locations and help to generate a verification set to process the verification of query results.It can be proved that our algorithms can effectively guarantee the privacy of the data stored on the storage nodes and the integrity of the query results.Our experimental results also show that after introducing the MR-Trees in KNN queries on TMWSNs,the communication overhead has an effective reduction compared to BAFSKQ.
文摘With the rapid development of information technology,semantic web data present features of massiveness and complexity.As the data-centric science,social computing have great influence in collecting and analyzing semantic data.In our contribution,we propose an integrity constraint validation for DL-LiteR based ontology in view of data correctness issue in the progress of social computing applications.Firstly,at the basis of translations from integrity constraint axioms into a set of conjunctive queries,integrity constraint validation is converted into the conjunctive query answering over knowledge bases.Moreover,rewriting rules are used for reformulating the integrity constraint axioms using standard axioms.On this account,the integrity constraint validation can be reduced to the query evaluation over the ABox,and use query mechanisms in database management systems to optimize integrity constraint validation.Finally,the experimental result shows that the rewritingbased method greatly improves the efficiency of integrity constraint validation and is more appropriate to scalable data in the semantic web.
基金Supported by the National High-Tech Research and Development Plan of China under Grant No.2004AA112010 (国家高技术研究发展 计划(863))the National Basic Research Program of China under Grant No.2002CB312005 (国家重点基础研究发展计划(973))