Tiered Mobile Wireless Sensor Network(TMWSN)is a new paradigm introduced by mobile edge computing.Now it has received wide attention because of its high scalability,robustness,deployment flexibility,and it has a wide ...Tiered Mobile Wireless Sensor Network(TMWSN)is a new paradigm introduced by mobile edge computing.Now it has received wide attention because of its high scalability,robustness,deployment flexibility,and it has a wide range of application scenarios.In TMWSNs,the storage nodes are the key nodes of the network and are more easily captured and utilized by attackers.Once the storage nodes are captured by the attackers,the data stored on them will be exposed.Moreover,the query process and results will not be trusted any more.This paper mainly studies the secure KNN query technology in TMWSNs,and we propose a secure KNN query algorithm named the Basic Algorithm For Secure KNN Query(BAFSKQ)first,which can protect privacy and verify the integrity of query results.However,this algorithm has a large communication overhead in most cases.In order to solve this problem,we propose an improved algorithm named the Secure KNN Query Algorithm Based on MR-Tree(SEKQAM).The MR-Trees are used to find the K-nearest locations and help to generate a verification set to process the verification of query results.It can be proved that our algorithms can effectively guarantee the privacy of the data stored on the storage nodes and the integrity of the query results.Our experimental results also show that after introducing the MR-Trees in KNN queries on TMWSNs,the communication overhead has an effective reduction compared to BAFSKQ.展开更多
Answering reachability queries is one of the fundamental graph operations.Existing approaches either accelerate index construction by constructing an index that covers only partial reachability relationship,which may ...Answering reachability queries is one of the fundamental graph operations.Existing approaches either accelerate index construction by constructing an index that covers only partial reachability relationship,which may result in performing cost traversing operation when answering a query;or accelerate query answering by constructing an index covering the complete reachability relationship,which may be inefficient due to comparing the complete node labels.We propose a novel labeling scheme,which covers the complete reachability relationship,to accelerate reachability queries processing.The idea is to decompose the given directed acyclic graph(DAG)G into two subgraphs,G1 and G2.For G1,we propose to use topological labels consisting of two integers to answer all reachability queries.For G2,we construct 2-hop labels as existing methods do to answer queries that cannot be answered by topological labels.The benefits of our method lie in two aspects.On one hand,our method does not need to perform the cost traversing operation when answering queries.On the other hand,our method can quickly answer most queries in constant time without comparing the whole node labels.We confirm the efficiency of our approaches by extensive experimental studies using 20 real datasets.展开更多
In the XML community, exact queries allow users to specify exactly what they want to check and/or retrieve in an XML document. When they are applied to a semi-structured document or to a document with an overly comple...In the XML community, exact queries allow users to specify exactly what they want to check and/or retrieve in an XML document. When they are applied to a semi-structured document or to a document with an overly complex model, the lack or the ignorance of the explicit document model (DTD—Document Type Definition, Schema, etc.) increases the risk of obtaining an empty result set when the query is too specific, or, too large result set when it is too vague (e.g. it contains wildcards such as “*”). The reason is that in both cases, users write queries according to the document model they have in mind;this can be very far from the one that can actually be extracted from the document. Opposed to exact queries, preference queries are more flexible and can be relaxed to expand the search space during their evaluations. Indeed, during their evaluation, certain constraints (the preferences they contain) can be relaxed if necessary to avoid precisely empty results;moreover, the returned answers can be filtered to retain only the best ones. This paper presents an algorithm for evaluating such queries inspired by the TreeMatch algorithm proposed by Yao et al. for exact queries. In the proposed algorithm, the best answers are obtained by using an adaptation of the Skyline operator (defined in relational databases) in the context of documents (trees) to incrementally filter into the partial solutions set, those which satisfy the maximum of preferential constraints. The only restriction imposed on documents is No-Self-Containment.展开更多
Geospatial datasets are typically available as distributed collections contributed by various government or commercial providers. Supporting the diverse needs of various users that may be accessing the same dataset fo...Geospatial datasets are typically available as distributed collections contributed by various government or commercial providers. Supporting the diverse needs of various users that may be accessing the same dataset for different applications remains a challenging issue. In order to overcome this challenge there is a clear need to develop the capabilities to take into account complicated patterns of preference describing user and/or application particularities, and use these patterns to rank query results in terms of suitability. This paper offers a demonstration on how intelligent systems can assist geospatial queries to improve retrieval accuracy by customizing results based on preference patterns. We outline the particularities of the geospatial domain and present our method and its application.展开更多
It is nontrivial to maintain such discovered frequent query patterns in real XML-DBMS because the transaction database of queries may allow frequent updates and such updates may not only invalidate some existing frequ...It is nontrivial to maintain such discovered frequent query patterns in real XML-DBMS because the transaction database of queries may allow frequent updates and such updates may not only invalidate some existing frequent query patterns but also generate some new frequent query patterns. In this paper, two incremental updating algorithms, FUX-QMiner and FUXQMiner, are proposed for efficient maintenance of discovered frequent query patterns and generation the new frequent query patterns when new XMI, queries are added into the database. Experimental results from our implementation show that the proposed algorithms have good performance. Key words XML - frequent query pattern - incremental algorithm - data mining CLC number TP 311 Foudation item: Supported by the Youthful Foundation for Scientific Research of University of Shanghai for Science and TechnologyBiography: PENG Dun-lu (1974-), male, Associate professor, Ph.D, research direction: data mining, Web service and its application, peerto-peer computing.展开更多
This paper proposes a checking method based on mutual instances and discusses three key problems in the method: how to deal with mistakes in the mutual instances and how to deal with too many or too few mutual instan...This paper proposes a checking method based on mutual instances and discusses three key problems in the method: how to deal with mistakes in the mutual instances and how to deal with too many or too few mutual instances. It provides the checking based on the weighted mutual instances considering fault tolerance, gives a way to partition the large-scale mutual instances, and proposes a process greatly reducing the manual annotation work to get more mutual instances. Intension annotation that improves the checking method is also discussed. The method is practical and effective to check subsumption relations between concept queries in different ontologies based on mutual instances.展开更多
Finding all occurrences of a twig query in an XML database is a core operation for efficient evaluation of XML queries. It is important to effiectively handle twig queries with wildcards. In this paper, a novel path-p...Finding all occurrences of a twig query in an XML database is a core operation for efficient evaluation of XML queries. It is important to effiectively handle twig queries with wildcards. In this paper, a novel path-partitioned encoding scheme is proposed for XML documents to capture paths of all elements, and a twig query is modeled as an XPattern extended from tree pattern. After definition, simplification, normalization, verification and initialization of the XPattern, both work sets and a join plan are generated. According to these measures, an effiective algorithm to answer for a twig query, called DMTwig, is designed without unnecessary elements and invalid structural joins. The algorithm can adaptively deal with twig queries with branch ([ ]), child edge (/), descendant edge (//), and wildcard (*) synthetically. We show that path-partitioned encoding scheme and XPattern guarantee the I/O and CPU optimality for twig queries. Experiments on representative data set indicate that the proposed solution performs significantly.展开更多
Moving object database (MOD) engine is the foundation of Location-Based Service (LBS) information systems. Continuous queries are important in spatial-temporal reasoning of a MOD. The communication costs were the bott...Moving object database (MOD) engine is the foundation of Location-Based Service (LBS) information systems. Continuous queries are important in spatial-temporal reasoning of a MOD. The communication costs were the bottleneck for improving query efficiency until the rectangular safe region algorithm partly solved this problem. However, this algorithm can be further improved, as we demonstrate with the dynamic interval based continuous queries algorithm on moving objects. Two components, circular safe region and dynamic intervals were adopted by our algorithm. Theoretical proof and experimental results show that our algorithm substantially outperforms the traditional periodic monitoring and the rectangular safe region algorithm in terms of monitoring accuracy, reducing communication costs and server CPU time. Moreover, in our algorithm, the mobile terminals do not need to have any computational ability.展开更多
Users can obtain the information through a basic web searching and find the answer to the questions directly,but maybe the expected answer does not exist.Besides,we do not know the update of new information in time.Th...Users can obtain the information through a basic web searching and find the answer to the questions directly,but maybe the expected answer does not exist.Besides,we do not know the update of new information in time.The online social networking services spread quickly and store many user data,but these data are worth less and may be unreliable answer to users’ questions.Users can obtain the simple answer but can not expect more additional information in knowledge question-answering(QA)system.In this paper,we design the system with the advantages of knowledge QA system,web searching and characteristics of social networking service for providing social network channel based on the query and answer without users’ contact network.The user can obtain real-time answers by the user network interested in users’ querires through the network channel of this system,get the additional information effectively and share it with others in the social network channel in this system.展开更多
The k-median problem has attracted a number of researchers. However,few of them have considered both the dynamic environment and the issue of accuracy. In this paper,a new type of query is studied,called continuous me...The k-median problem has attracted a number of researchers. However,few of them have considered both the dynamic environment and the issue of accuracy. In this paper,a new type of query is studied,called continuous median monitoring (CMM) query. It considers the k-median problem under dynamic environment with an accuracy guarantee. A continuous group nearest neighbor based (CGB) algorithm and an average distance medoid (ADM) algorithm are proposed to solve the CMM problem. ADM is a hill climbing schemed algorithm and achieves a rapid converging speed by checking only qualified candidates. Experiments show that ADM is more efficient than CGB and outperforms the classical PAM (partitioning around medoids) and CLARANS (clustering large applications based on randomized search) algorithms with various parameter settings.展开更多
In a database-as-a-service(DaaS)model,a data owner stores data in a database server of a service provider,and the DaaS adopts the encryption for data privacy and indexing for data query.However,an attacker can obtain ...In a database-as-a-service(DaaS)model,a data owner stores data in a database server of a service provider,and the DaaS adopts the encryption for data privacy and indexing for data query.However,an attacker can obtain original data’s statistical information and distribution via the indexing distribution from the database of the service provider.In this work,a novel indexing schema is proposed to satisfy privacy-preserved data management requirements,in which an attacker cannot obtain data source distribution or statistic information from the index.The approach includes 2 parts:the Hash-based indexing for encrypted data and correctness verification for range queries.The evaluation results demonstrate that the approach can hide statistical information of encrypted data distribution while can also obtain correct answers for range queries.Meanwhile,the approach can achieve nearly 10 times and 35 times improvement on encrypted data publishing and indexing respectively,compared with the start-of-the-art method order-preserving Hash-based function(OPHF).展开更多
Efficiently querying Description Logic (DL) ontologies is becoming a vital task in various data-intensive DL applications. Considered as a basic service for answering object queries over DL ontologies, instance checki...Efficiently querying Description Logic (DL) ontologies is becoming a vital task in various data-intensive DL applications. Considered as a basic service for answering object queries over DL ontologies, instance checking can be realized by using the most specific concept (MSC) method, which converts instance checking into subsumption problems. This method, however, loses its simplicity and efficiency when applied to large and complex ontologies, as it tends to generate very large MSCs that could lead to intractable reasoning. In this paper, we propose a revision to this MSC method for DL SHI , allowing it to generate much simpler and smaller concepts that are specific enough to answer a given query. With independence between computed MSCs, scalability for query answering can also be achieved by distributing and parallelizing the computations. An empirical evaluation shows the efficacy of our revised MSC method and the significant efficiency achieved when using it for answering object queries.展开更多
The performance and reliability of converting natural language into structured query language can be problematic in handling nuances that are prevalent in natural language. Relational databases are not designed to und...The performance and reliability of converting natural language into structured query language can be problematic in handling nuances that are prevalent in natural language. Relational databases are not designed to understand language nuance, therefore the question why we must handle nuance has to be asked. This paper is looking at an alternative solution for the conversion of a Natural Language Query into a Structured Query Language (SQL) capable of being used to search a relational database. The process uses the natural language concept, Part of Speech to identify words that can be used to identify database tables and table columns. The use of Open NLP based grammar files, as well as additional configuration files, assist in the translation from natural language to query language. Having identified which tables and which columns contain the pertinent data the next step is to create the SQL statement.展开更多
In many database applications, ranking queries may reference both text and numeric attributes, where the ranking functions are based on both semantic distances/similarities for text attributes and numeric distances fo...In many database applications, ranking queries may reference both text and numeric attributes, where the ranking functions are based on both semantic distances/similarities for text attributes and numeric distances for numeric attributes. In this paper, we propose a new method for evaluating such type of ranking queries over a relational database. By statistics and training, this method builds a mechanism that combines the semantic and numeric distances, and the mechanism can be used to balance the effects of text attributes and numeric attributes on matching a given query and tuples in database search. The basic idea of the method is to create an index based on WordNet to expand the tuple words semantically for text attributes and on the information of numeric attributes. The candidate results for a query are retrieved by the index and a simple SQL selection statement, and then top-N answers are obtained. The results of extensive experiments indicate that the performance of this new strategy is efficient and effective.展开更多
This paper presents the semantic analysis of queries written in natural language (French) and dedicated to the object oriented data bases. The studied queries include one or two nominal groups (NG) articulating around...This paper presents the semantic analysis of queries written in natural language (French) and dedicated to the object oriented data bases. The studied queries include one or two nominal groups (NG) articulating around a verb. A NG consists of one or several keywords (application dependent noun or value). Simple semantic filters are defined for identifying these keywords which can be of semantic value: class, simple attribute, composed attribute, key value or not key value. Coherence rules and coherence constraints are introduced, to check the validity of the co-occurrence of two consecutive nouns in complex NG. If a query is constituted of a single NG, no further analysis is required. Otherwise, if a query covers two valid NG, it is a subject of studying the semantic coherence of the verb and both NG which are attached to it.展开更多
Biomedical questions are usually complex and regard several different life science aspects. Numerous valuable and he- terogeneous data are increasingly available to answer such questions. Yet, they are dispersedly sto...Biomedical questions are usually complex and regard several different life science aspects. Numerous valuable and he- terogeneous data are increasingly available to answer such questions. Yet, they are dispersedly stored and difficult to be queried comprehensively. We created a Genomic and Proteomic Data Warehouse (GPDW) that integrates data provided by some of the main bioinformatics databases. It adopts a modular integrated data schema and several metadata to describe the integrated data, their sources and their location in the GPDW. Here, we present the Web application that we developed to enable any user to easily compose queries, although complex, on all data integrated in the GPDW. It is publicly available at http://www.bioinformatics.dei.polimi.it/GPKB/. Through a visual interface, the user is only required to select the types of data to be included in the query and the conditions on their values to be retrieved. Then, the Web application leverages the metadata and modular schema of the GPDW to automatically compose an efficient SQL query, run it on the GPDW and show the extracted requested data, enriched with links to external data sources. Performed tests demonstrated efficiency and usability of the developed Web application, and showed its and GPDW relevance in supporting answering biomedical questions, also difficult.展开更多
Purpose:Existing researches of predicting queries with news intents have tried to extract the classification features from external knowledge bases,this paper tries to present how to apply features extracted from quer...Purpose:Existing researches of predicting queries with news intents have tried to extract the classification features from external knowledge bases,this paper tries to present how to apply features extracted from query logs for automatic identification of news queries without using any external resources.Design/methodology/approach:First,we manually labeled 1,220 news queries from Sogou.com.Based on the analysis of these queries,we then identified three features of news queries in terms of query content,time of query occurrence and user click behavior.Afterwards,we used 12 effective features proposed in literature as baseline and conducted experiments based on the support vector machine(SVM)classifier.Finally,we compared the impacts of the features used in this paper on the identification of news queries.Findings:Compared with baseline features,the F-score has been improved from 0.6414 to0.8368 after the use of three newly-identified features,among which the burst point(bst)was the most effective while predicting news queries.In addition,query expression(qes)was more useful than query terms,and among the click behavior-based features,news URL was the most effective one.Research limitations:Analyses based on features extracted from query logs might lead to produce limited results.Instead of short queries,the segmentation tool used in this study has been more widely applied for long texts.Practical implications:The research will be helpful for general-purpose search engines to address search intents for news events.Originality/value:Our approach provides a new and different perspective in recognizing queries with news intent without such large news corpora as blogs or Twitter.展开更多
In traditional database applications, queries intend to retrieve data satisfying precise conditions. As a result, thousands of data can be retrieved (overabundant answer) or, even worse, no data at all (empty answer)....In traditional database applications, queries intend to retrieve data satisfying precise conditions. As a result, thousands of data can be retrieved (overabundant answer) or, even worse, no data at all (empty answer). In both cases, the queries must be reformulated to produce more significant results and, typically, many related queries are submitted by a user before he can be finally satisfied. To overcome these problems, this paper proposes a unified solution in the framework of flexible queries with fuzzy semantics. This solution, based on the concept of semantic proximity and implemented in a tool for flexible query answering, allows the automatic reformulation of queries with empty or overabundant answers.展开更多
Visual queries assist non-expert users to extract information from spatial databases in an intuitive and natural approach,making Geographic information systems comprehensive and efficient for a wide range of applicati...Visual queries assist non-expert users to extract information from spatial databases in an intuitive and natural approach,making Geographic information systems comprehensive and efficient for a wide range of applications.A common visual means of querying takes the form of drawings or graphs,under which many spatial ambiguity and translation errors rise.In this study,common query attributes extracted from user graphs such as spatial topology,size,cardinality,and proximity are regarded under a conceptual moderation scheme.Thus,the system/user may concentrate on various conceptual combinations of information.Furthermore,time is incorporated to support spatiotemporal queries for changing scenes and moving objects.Arbitrary,relative,and absolute scaling is possible according to the data-set and application at hand.The theoretic approach is implemented under a prototype user interface system,called ShapeController.Under this prototype,a user may extract scene-based relations in an automatically inferred fashion,or include single object-oriented relations when all possible relations seem redundant.Finally,a natural language description of the query is extracted upon which the user may select the desired query relations.Experimentation on a spatial database demonstrates the concepts of predefined draw objects,scaling relaxation,conceptual abstraction,and scene,object-and textual-oriented transitions that promote query expressiveness and restrain ambiguities.展开更多
We consider the problem of efficiently computing distributed geographical k-NN queries in an unstructured peer-to-peer (P2P) system, in which each peer is managed by an individual organization and can only communica...We consider the problem of efficiently computing distributed geographical k-NN queries in an unstructured peer-to-peer (P2P) system, in which each peer is managed by an individual organization and can only communicate with its logical neighboring peers. Such queries are based on local filter query statistics, and require as less communication cost as possible which makes it more difficult than the existing distributed k-NN queries. Especially, we hope to reduce candidate peers and degrade communication cost. In this paper, we propose an efficient pruning technique to minimize the number of candidate peers to be processed to answer the k-NN queries. Our approach is especially suitable for continuous k-NN queries when updating peers, including changing ranges of peers, dynamically leaving or joining peers, and updating data in a peer. In addition, simulation results show that the proposed approach outperforms the existing Minimum Bounding Rectangle (MBR)-based query approaches, especially for continuous queries.展开更多
基金This work is supported by the Aeronautical Science Foundation of China under Grant 20165515001the National Natural Science Foundation of China under Grant No.61402225State Key Laboratory for smart grid protection and operation control Foundation,and the Science and Technology Funds from National State Grid Ltd.(The Research on Key Technologies of Distributed Parallel Database Storage and Processing based on Big Data).
文摘Tiered Mobile Wireless Sensor Network(TMWSN)is a new paradigm introduced by mobile edge computing.Now it has received wide attention because of its high scalability,robustness,deployment flexibility,and it has a wide range of application scenarios.In TMWSNs,the storage nodes are the key nodes of the network and are more easily captured and utilized by attackers.Once the storage nodes are captured by the attackers,the data stored on them will be exposed.Moreover,the query process and results will not be trusted any more.This paper mainly studies the secure KNN query technology in TMWSNs,and we propose a secure KNN query algorithm named the Basic Algorithm For Secure KNN Query(BAFSKQ)first,which can protect privacy and verify the integrity of query results.However,this algorithm has a large communication overhead in most cases.In order to solve this problem,we propose an improved algorithm named the Secure KNN Query Algorithm Based on MR-Tree(SEKQAM).The MR-Trees are used to find the K-nearest locations and help to generate a verification set to process the verification of query results.It can be proved that our algorithms can effectively guarantee the privacy of the data stored on the storage nodes and the integrity of the query results.Our experimental results also show that after introducing the MR-Trees in KNN queries on TMWSNs,the communication overhead has an effective reduction compared to BAFSKQ.
基金This work was partly supported by National Key R&D Program of China,Grant No.2017YFB0309800the grants from the Natural Science Foundation of China(No.61472339,No.61303040,No.61572421,No.61272124)+1 种基金Shanghai Alliance Program(LM201552)Shanghai University of Engineering and Technology School-Enterprise cooperation projects(15)(DZ-025).
文摘Answering reachability queries is one of the fundamental graph operations.Existing approaches either accelerate index construction by constructing an index that covers only partial reachability relationship,which may result in performing cost traversing operation when answering a query;or accelerate query answering by constructing an index covering the complete reachability relationship,which may be inefficient due to comparing the complete node labels.We propose a novel labeling scheme,which covers the complete reachability relationship,to accelerate reachability queries processing.The idea is to decompose the given directed acyclic graph(DAG)G into two subgraphs,G1 and G2.For G1,we propose to use topological labels consisting of two integers to answer all reachability queries.For G2,we construct 2-hop labels as existing methods do to answer queries that cannot be answered by topological labels.The benefits of our method lie in two aspects.On one hand,our method does not need to perform the cost traversing operation when answering queries.On the other hand,our method can quickly answer most queries in constant time without comparing the whole node labels.We confirm the efficiency of our approaches by extensive experimental studies using 20 real datasets.
文摘In the XML community, exact queries allow users to specify exactly what they want to check and/or retrieve in an XML document. When they are applied to a semi-structured document or to a document with an overly complex model, the lack or the ignorance of the explicit document model (DTD—Document Type Definition, Schema, etc.) increases the risk of obtaining an empty result set when the query is too specific, or, too large result set when it is too vague (e.g. it contains wildcards such as “*”). The reason is that in both cases, users write queries according to the document model they have in mind;this can be very far from the one that can actually be extracted from the document. Opposed to exact queries, preference queries are more flexible and can be relaxed to expand the search space during their evaluations. Indeed, during their evaluation, certain constraints (the preferences they contain) can be relaxed if necessary to avoid precisely empty results;moreover, the returned answers can be filtered to retain only the best ones. This paper presents an algorithm for evaluating such queries inspired by the TreeMatch algorithm proposed by Yao et al. for exact queries. In the proposed algorithm, the best answers are obtained by using an adaptation of the Skyline operator (defined in relational databases) in the context of documents (trees) to incrementally filter into the partial solutions set, those which satisfy the maximum of preferential constraints. The only restriction imposed on documents is No-Self-Containment.
文摘Geospatial datasets are typically available as distributed collections contributed by various government or commercial providers. Supporting the diverse needs of various users that may be accessing the same dataset for different applications remains a challenging issue. In order to overcome this challenge there is a clear need to develop the capabilities to take into account complicated patterns of preference describing user and/or application particularities, and use these patterns to rank query results in terms of suitability. This paper offers a demonstration on how intelligent systems can assist geospatial queries to improve retrieval accuracy by customizing results based on preference patterns. We outline the particularities of the geospatial domain and present our method and its application.
文摘It is nontrivial to maintain such discovered frequent query patterns in real XML-DBMS because the transaction database of queries may allow frequent updates and such updates may not only invalidate some existing frequent query patterns but also generate some new frequent query patterns. In this paper, two incremental updating algorithms, FUX-QMiner and FUXQMiner, are proposed for efficient maintenance of discovered frequent query patterns and generation the new frequent query patterns when new XMI, queries are added into the database. Experimental results from our implementation show that the proposed algorithms have good performance. Key words XML - frequent query pattern - incremental algorithm - data mining CLC number TP 311 Foudation item: Supported by the Youthful Foundation for Scientific Research of University of Shanghai for Science and TechnologyBiography: PENG Dun-lu (1974-), male, Associate professor, Ph.D, research direction: data mining, Web service and its application, peerto-peer computing.
基金Supported by the National Natural Sciences Foundation of China(60373066 ,60425206 ,90412003) , National Grand Fundamental Research 973 Pro-gramof China(2002CB312000) , National Research Foundation for the Doctoral Pro-gramof Higher Education of China (20020286004)
文摘This paper proposes a checking method based on mutual instances and discusses three key problems in the method: how to deal with mistakes in the mutual instances and how to deal with too many or too few mutual instances. It provides the checking based on the weighted mutual instances considering fault tolerance, gives a way to partition the large-scale mutual instances, and proposes a process greatly reducing the manual annotation work to get more mutual instances. Intension annotation that improves the checking method is also discussed. The method is practical and effective to check subsumption relations between concept queries in different ontologies based on mutual instances.
基金supported by the National High-Tech Research and Development Plan of China (Grant No.2005AA4Z3030)
文摘Finding all occurrences of a twig query in an XML database is a core operation for efficient evaluation of XML queries. It is important to effiectively handle twig queries with wildcards. In this paper, a novel path-partitioned encoding scheme is proposed for XML documents to capture paths of all elements, and a twig query is modeled as an XPattern extended from tree pattern. After definition, simplification, normalization, verification and initialization of the XPattern, both work sets and a join plan are generated. According to these measures, an effiective algorithm to answer for a twig query, called DMTwig, is designed without unnecessary elements and invalid structural joins. The algorithm can adaptively deal with twig queries with branch ([ ]), child edge (/), descendant edge (//), and wildcard (*) synthetically. We show that path-partitioned encoding scheme and XPattern guarantee the I/O and CPU optimality for twig queries. Experiments on representative data set indicate that the proposed solution performs significantly.
文摘Moving object database (MOD) engine is the foundation of Location-Based Service (LBS) information systems. Continuous queries are important in spatial-temporal reasoning of a MOD. The communication costs were the bottleneck for improving query efficiency until the rectangular safe region algorithm partly solved this problem. However, this algorithm can be further improved, as we demonstrate with the dynamic interval based continuous queries algorithm on moving objects. Two components, circular safe region and dynamic intervals were adopted by our algorithm. Theoretical proof and experimental results show that our algorithm substantially outperforms the traditional periodic monitoring and the rectangular safe region algorithm in terms of monitoring accuracy, reducing communication costs and server CPU time. Moreover, in our algorithm, the mobile terminals do not need to have any computational ability.
基金Industrial Strategic Technology Development Program,Development of a Cognitive Planning and Learning Model for Mobile Platforms(No.10035348) funded by MKE(the Ministry of Knowledge Economy),Korea
文摘Users can obtain the information through a basic web searching and find the answer to the questions directly,but maybe the expected answer does not exist.Besides,we do not know the update of new information in time.The online social networking services spread quickly and store many user data,but these data are worth less and may be unreliable answer to users’ questions.Users can obtain the simple answer but can not expect more additional information in knowledge question-answering(QA)system.In this paper,we design the system with the advantages of knowledge QA system,web searching and characteristics of social networking service for providing social network channel based on the query and answer without users’ contact network.The user can obtain real-time answers by the user network interested in users’ querires through the network channel of this system,get the additional information effectively and share it with others in the social network channel in this system.
文摘The k-median problem has attracted a number of researchers. However,few of them have considered both the dynamic environment and the issue of accuracy. In this paper,a new type of query is studied,called continuous median monitoring (CMM) query. It considers the k-median problem under dynamic environment with an accuracy guarantee. A continuous group nearest neighbor based (CGB) algorithm and an average distance medoid (ADM) algorithm are proposed to solve the CMM problem. ADM is a hill climbing schemed algorithm and achieves a rapid converging speed by checking only qualified candidates. Experiments show that ADM is more efficient than CGB and outperforms the classical PAM (partitioning around medoids) and CLARANS (clustering large applications based on randomized search) algorithms with various parameter settings.
基金the National Natural Science Foundation of China(No.61931019).
文摘In a database-as-a-service(DaaS)model,a data owner stores data in a database server of a service provider,and the DaaS adopts the encryption for data privacy and indexing for data query.However,an attacker can obtain original data’s statistical information and distribution via the indexing distribution from the database of the service provider.In this work,a novel indexing schema is proposed to satisfy privacy-preserved data management requirements,in which an attacker cannot obtain data source distribution or statistic information from the index.The approach includes 2 parts:the Hash-based indexing for encrypted data and correctness verification for range queries.The evaluation results demonstrate that the approach can hide statistical information of encrypted data distribution while can also obtain correct answers for range queries.Meanwhile,the approach can achieve nearly 10 times and 35 times improvement on encrypted data publishing and indexing respectively,compared with the start-of-the-art method order-preserving Hash-based function(OPHF).
文摘Efficiently querying Description Logic (DL) ontologies is becoming a vital task in various data-intensive DL applications. Considered as a basic service for answering object queries over DL ontologies, instance checking can be realized by using the most specific concept (MSC) method, which converts instance checking into subsumption problems. This method, however, loses its simplicity and efficiency when applied to large and complex ontologies, as it tends to generate very large MSCs that could lead to intractable reasoning. In this paper, we propose a revision to this MSC method for DL SHI , allowing it to generate much simpler and smaller concepts that are specific enough to answer a given query. With independence between computed MSCs, scalability for query answering can also be achieved by distributing and parallelizing the computations. An empirical evaluation shows the efficacy of our revised MSC method and the significant efficiency achieved when using it for answering object queries.
文摘The performance and reliability of converting natural language into structured query language can be problematic in handling nuances that are prevalent in natural language. Relational databases are not designed to understand language nuance, therefore the question why we must handle nuance has to be asked. This paper is looking at an alternative solution for the conversion of a Natural Language Query into a Structured Query Language (SQL) capable of being used to search a relational database. The process uses the natural language concept, Part of Speech to identify words that can be used to identify database tables and table columns. The use of Open NLP based grammar files, as well as additional configuration files, assist in the translation from natural language to query language. Having identified which tables and which columns contain the pertinent data the next step is to create the SQL statement.
文摘In many database applications, ranking queries may reference both text and numeric attributes, where the ranking functions are based on both semantic distances/similarities for text attributes and numeric distances for numeric attributes. In this paper, we propose a new method for evaluating such type of ranking queries over a relational database. By statistics and training, this method builds a mechanism that combines the semantic and numeric distances, and the mechanism can be used to balance the effects of text attributes and numeric attributes on matching a given query and tuples in database search. The basic idea of the method is to create an index based on WordNet to expand the tuple words semantically for text attributes and on the information of numeric attributes. The candidate results for a query are retrieved by the index and a simple SQL selection statement, and then top-N answers are obtained. The results of extensive experiments indicate that the performance of this new strategy is efficient and effective.
文摘This paper presents the semantic analysis of queries written in natural language (French) and dedicated to the object oriented data bases. The studied queries include one or two nominal groups (NG) articulating around a verb. A NG consists of one or several keywords (application dependent noun or value). Simple semantic filters are defined for identifying these keywords which can be of semantic value: class, simple attribute, composed attribute, key value or not key value. Coherence rules and coherence constraints are introduced, to check the validity of the co-occurrence of two consecutive nouns in complex NG. If a query is constituted of a single NG, no further analysis is required. Otherwise, if a query covers two valid NG, it is a subject of studying the semantic coherence of the verb and both NG which are attached to it.
文摘Biomedical questions are usually complex and regard several different life science aspects. Numerous valuable and he- terogeneous data are increasingly available to answer such questions. Yet, they are dispersedly stored and difficult to be queried comprehensively. We created a Genomic and Proteomic Data Warehouse (GPDW) that integrates data provided by some of the main bioinformatics databases. It adopts a modular integrated data schema and several metadata to describe the integrated data, their sources and their location in the GPDW. Here, we present the Web application that we developed to enable any user to easily compose queries, although complex, on all data integrated in the GPDW. It is publicly available at http://www.bioinformatics.dei.polimi.it/GPKB/. Through a visual interface, the user is only required to select the types of data to be included in the query and the conditions on their values to be retrieved. Then, the Web application leverages the metadata and modular schema of the GPDW to automatically compose an efficient SQL query, run it on the GPDW and show the extracted requested data, enriched with links to external data sources. Performed tests demonstrated efficiency and usability of the developed Web application, and showed its and GPDW relevance in supporting answering biomedical questions, also difficult.
基金supported by the Social Science Planning Foundation of Chongqing(Grant No.:2011QNCB28)
文摘Purpose:Existing researches of predicting queries with news intents have tried to extract the classification features from external knowledge bases,this paper tries to present how to apply features extracted from query logs for automatic identification of news queries without using any external resources.Design/methodology/approach:First,we manually labeled 1,220 news queries from Sogou.com.Based on the analysis of these queries,we then identified three features of news queries in terms of query content,time of query occurrence and user click behavior.Afterwards,we used 12 effective features proposed in literature as baseline and conducted experiments based on the support vector machine(SVM)classifier.Finally,we compared the impacts of the features used in this paper on the identification of news queries.Findings:Compared with baseline features,the F-score has been improved from 0.6414 to0.8368 after the use of three newly-identified features,among which the burst point(bst)was the most effective while predicting news queries.In addition,query expression(qes)was more useful than query terms,and among the click behavior-based features,news URL was the most effective one.Research limitations:Analyses based on features extracted from query logs might lead to produce limited results.Instead of short queries,the segmentation tool used in this study has been more widely applied for long texts.Practical implications:The research will be helpful for general-purpose search engines to address search intents for news events.Originality/value:Our approach provides a new and different perspective in recognizing queries with news intent without such large news corpora as blogs or Twitter.
基金supported by CNPq(Brazilian National Counsel of Technological and Scientific Development),under grant numbers 305484/2012-5 and 104200/2013-8.
文摘In traditional database applications, queries intend to retrieve data satisfying precise conditions. As a result, thousands of data can be retrieved (overabundant answer) or, even worse, no data at all (empty answer). In both cases, the queries must be reformulated to produce more significant results and, typically, many related queries are submitted by a user before he can be finally satisfied. To overcome these problems, this paper proposes a unified solution in the framework of flexible queries with fuzzy semantics. This solution, based on the concept of semantic proximity and implemented in a tool for flexible query answering, allows the automatic reformulation of queries with empty or overabundant answers.
基金supported by the basic research grant of the Special Account for Research of the Technical University of Crete for the project“Spatiotemporal queries by sketch in moving object geographic databases”(No.80080).
文摘Visual queries assist non-expert users to extract information from spatial databases in an intuitive and natural approach,making Geographic information systems comprehensive and efficient for a wide range of applications.A common visual means of querying takes the form of drawings or graphs,under which many spatial ambiguity and translation errors rise.In this study,common query attributes extracted from user graphs such as spatial topology,size,cardinality,and proximity are regarded under a conceptual moderation scheme.Thus,the system/user may concentrate on various conceptual combinations of information.Furthermore,time is incorporated to support spatiotemporal queries for changing scenes and moving objects.Arbitrary,relative,and absolute scaling is possible according to the data-set and application at hand.The theoretic approach is implemented under a prototype user interface system,called ShapeController.Under this prototype,a user may extract scene-based relations in an automatically inferred fashion,or include single object-oriented relations when all possible relations seem redundant.Finally,a natural language description of the query is extracted upon which the user may select the desired query relations.Experimentation on a spatial database demonstrates the concepts of predefined draw objects,scaling relaxation,conceptual abstraction,and scene,object-and textual-oriented transitions that promote query expressiveness and restrain ambiguities.
基金the Program for New Century Excellent Talents in Universities(Grant No.NCET-06-0290)the National Natural Science Foundation of China(Grant Nos.60503036,and 60773221)+1 种基金the National High-Tech Development 863 Program of China(Grant No.2006AA09Z139)the Fok Ying Tong Education Foundation Award(Grant No.104027)
文摘We consider the problem of efficiently computing distributed geographical k-NN queries in an unstructured peer-to-peer (P2P) system, in which each peer is managed by an individual organization and can only communicate with its logical neighboring peers. Such queries are based on local filter query statistics, and require as less communication cost as possible which makes it more difficult than the existing distributed k-NN queries. Especially, we hope to reduce candidate peers and degrade communication cost. In this paper, we propose an efficient pruning technique to minimize the number of candidate peers to be processed to answer the k-NN queries. Our approach is especially suitable for continuous k-NN queries when updating peers, including changing ranges of peers, dynamically leaving or joining peers, and updating data in a peer. In addition, simulation results show that the proposed approach outperforms the existing Minimum Bounding Rectangle (MBR)-based query approaches, especially for continuous queries.