In a cloud environment,outsourced graph data is widely used in companies,enterprises,medical institutions,and so on.Data owners and users can save costs and improve efficiency by storing large amounts of graph data on...In a cloud environment,outsourced graph data is widely used in companies,enterprises,medical institutions,and so on.Data owners and users can save costs and improve efficiency by storing large amounts of graph data on cloud servers.Servers on cloud platforms usually have some subjective or objective attacks,which make the outsourced graph data in an insecure state.The issue of privacy data protection has become an important obstacle to data sharing and usage.How to query outsourcing graph data safely and effectively has become the focus of research.Adjacency query is a basic and frequently used operation in graph,and it will effectively promote the query range and query ability if multi-keyword fuzzy search can be supported at the same time.This work proposes to protect the privacy information of outsourcing graph data by encryption,mainly studies the problem of multi-keyword fuzzy adjacency query,and puts forward a solution.In our scheme,we use the Bloom filter and encryption mechanism to build a secure index and query token,and adjacency queries are implemented through indexes and query tokens on the cloud server.Our proposed scheme is proved by formal analysis,and the performance and effectiveness of the scheme are illustrated by experimental analysis.The research results of this work will provide solid theoretical and technical support for the further popularization and application of encrypted graph data processing technology.展开更多
This article presents an innovative approach to automatic rule discovery for data transformation tasks leveraging XGBoost,a machine learning algorithm renowned for its efficiency and performance.The framework proposed...This article presents an innovative approach to automatic rule discovery for data transformation tasks leveraging XGBoost,a machine learning algorithm renowned for its efficiency and performance.The framework proposed herein utilizes the fusion of diversified feature formats,specifically,metadata,textual,and pattern features.The goal is to enhance the system’s ability to discern and generalize transformation rules fromsource to destination formats in varied contexts.Firstly,the article delves into the methodology for extracting these distinct features from raw data and the pre-processing steps undertaken to prepare the data for the model.Subsequent sections expound on the mechanism of feature optimization using Recursive Feature Elimination(RFE)with linear regression,aiming to retain the most contributive features and eliminate redundant or less significant ones.The core of the research revolves around the deployment of the XGBoostmodel for training,using the prepared and optimized feature sets.The article presents a detailed overview of the mathematical model and algorithmic steps behind this procedure.Finally,the process of rule discovery(prediction phase)by the trained XGBoost model is explained,underscoring its role in real-time,automated data transformations.By employingmachine learning and particularly,the XGBoost model in the context of Business Rule Engine(BRE)data transformation,the article underscores a paradigm shift towardsmore scalable,efficient,and less human-dependent data transformation systems.This research opens doors for further exploration into automated rule discovery systems and their applications in various sectors.展开更多
Through the mapping from UMQL ( unified multimedia query language) conditional expressions to UMQA (unified multimedia query algebra) query operations, a translation algorithm from a UMQL query to a UMQA query pla...Through the mapping from UMQL ( unified multimedia query language) conditional expressions to UMQA (unified multimedia query algebra) query operations, a translation algorithm from a UMQL query to a UMQA query plan is put forward, which can generate an equivalent UMQA internal query plan for any UMQL query. Then, to improve the execution costs of UMQA query plans effectively, equivalent UMQA translation formulae and general optimization strategies are studied, and an optimization algorithm for UMQA internal query plans is presented. This algorithm uses equivalent UMQA translation formulae to optimize query plans, and makes the optimized query plans accord with the optimization strategies as much as possible. Finally, the logic implementation methods of UMQA plans, i.e., logic implementation methods of UMQA operators, are discussed to obtain useful target data from a muifirnedia database. All of these algorithms are implemented in a UMQL prototype system. Application results show that these query processing techniques are feasible and applicable.展开更多
The idea of positional inverted index is exploited for indexing of graph database. The main idea is the use of hashing tables in order to prune a considerable portion of graph database that cannot contain the answer s...The idea of positional inverted index is exploited for indexing of graph database. The main idea is the use of hashing tables in order to prune a considerable portion of graph database that cannot contain the answer set. These tables are implemented using column-based techniques and are used to store graphs of database, frequent sub-graphs and the neighborhood of nodes. In order to exact checking of remaining graphs, the vertex invariant is used for isomorphism test which can be parallel implemented. The results of evaluation indicate that proposed method outperforms existing methods.展开更多
The problem of continuously monitoring multiple K-nearest neighbor (K-NN) queries with dynamic object and query dataset is valuable for many location-based applications. A practical method is to partition the data spa...The problem of continuously monitoring multiple K-nearest neighbor (K-NN) queries with dynamic object and query dataset is valuable for many location-based applications. A practical method is to partition the data space into grid cells, with both object and query table being indexed by this grid structure, while solving the problem by periodically joining cells of objects with queries having their influence regions intersecting the cells. In the worst case, all cells of objects will be accessed once. Object and query cache strategies are proposed to further reduce the I/O cost. With object cache strategy, queries remaining static in current processing cycle seldom need I/O cost, they can be returned quickly. The main I/O cost comes from moving queries, the query cache strategy is used to restrict their search-regions, which uses current results of queries in the main memory buffer. The queries can share not only the accessing of object pages, but also their influence regions. Theoretical analysis of the expected I/O cost is presented, with the I/O cost being about 40% that of the SEA-CNN method in the experiment results.展开更多
This paper proposes a checking method based on mutual instances and discusses three key problems in the method: how to deal with mistakes in the mutual instances and how to deal with too many or too few mutual instan...This paper proposes a checking method based on mutual instances and discusses three key problems in the method: how to deal with mistakes in the mutual instances and how to deal with too many or too few mutual instances. It provides the checking based on the weighted mutual instances considering fault tolerance, gives a way to partition the large-scale mutual instances, and proposes a process greatly reducing the manual annotation work to get more mutual instances. Intension annotation that improves the checking method is also discussed. The method is practical and effective to check subsumption relations between concept queries in different ontologies based on mutual instances.展开更多
Finding all occurrences of a twig query in an XML database is a core operation for efficient evaluation of XML queries. It is important to effiectively handle twig queries with wildcards. In this paper, a novel path-p...Finding all occurrences of a twig query in an XML database is a core operation for efficient evaluation of XML queries. It is important to effiectively handle twig queries with wildcards. In this paper, a novel path-partitioned encoding scheme is proposed for XML documents to capture paths of all elements, and a twig query is modeled as an XPattern extended from tree pattern. After definition, simplification, normalization, verification and initialization of the XPattern, both work sets and a join plan are generated. According to these measures, an effiective algorithm to answer for a twig query, called DMTwig, is designed without unnecessary elements and invalid structural joins. The algorithm can adaptively deal with twig queries with branch ([ ]), child edge (/), descendant edge (//), and wildcard (*) synthetically. We show that path-partitioned encoding scheme and XPattern guarantee the I/O and CPU optimality for twig queries. Experiments on representative data set indicate that the proposed solution performs significantly.展开更多
Skyline query processing has recently received a lot of attention in database and data mining communities. However, most existing algorithms consider how to efficiently process skyline queries from base tables. Obviou...Skyline query processing has recently received a lot of attention in database and data mining communities. However, most existing algorithms consider how to efficiently process skyline queries from base tables. Obviously, when the data size and the number of skyline queries increase, the time cost of skyline queries will increase exponentially, which will seriously influence the query efficiency. Motivated by the above, in this paper, we consider improving the query efficiency via skyline views and propose a cost-based algorithm(abbr. CA) to efficiently select the optimal set of skyline views for storage. The CA algorithm mainly includes two phases:(i) reduce the skyline views selection to the minimum steiner tree problem and obtain the approximate optimal set AOS of skyline views, and(ii) adjust AOS and produce the final optimal set FOS of skyline views based on the simulated annealing. Moreover, in order to improve the extendibility of the CA algorithm, we implement it based on the map/reduce distributed computation model in cloud computing environments. The detailed theoretical analyses and extensive experiments demonstrate that the CA algorithm is both efficient and effective.展开更多
Moving object database (MOD) engine is the foundation of Location-Based Service (LBS) information systems. Continuous queries are important in spatial-temporal reasoning of a MOD. The communication costs were the bott...Moving object database (MOD) engine is the foundation of Location-Based Service (LBS) information systems. Continuous queries are important in spatial-temporal reasoning of a MOD. The communication costs were the bottleneck for improving query efficiency until the rectangular safe region algorithm partly solved this problem. However, this algorithm can be further improved, as we demonstrate with the dynamic interval based continuous queries algorithm on moving objects. Two components, circular safe region and dynamic intervals were adopted by our algorithm. Theoretical proof and experimental results show that our algorithm substantially outperforms the traditional periodic monitoring and the rectangular safe region algorithm in terms of monitoring accuracy, reducing communication costs and server CPU time. Moreover, in our algorithm, the mobile terminals do not need to have any computational ability.展开更多
The k-median problem has attracted a number of researchers. However,few of them have considered both the dynamic environment and the issue of accuracy. In this paper,a new type of query is studied,called continuous me...The k-median problem has attracted a number of researchers. However,few of them have considered both the dynamic environment and the issue of accuracy. In this paper,a new type of query is studied,called continuous median monitoring (CMM) query. It considers the k-median problem under dynamic environment with an accuracy guarantee. A continuous group nearest neighbor based (CGB) algorithm and an average distance medoid (ADM) algorithm are proposed to solve the CMM problem. ADM is a hill climbing schemed algorithm and achieves a rapid converging speed by checking only qualified candidates. Experiments show that ADM is more efficient than CGB and outperforms the classical PAM (partitioning around medoids) and CLARANS (clustering large applications based on randomized search) algorithms with various parameter settings.展开更多
Geospatial datasets are typically available as distributed collections contributed by various government or commercial providers. Supporting the diverse needs of various users that may be accessing the same dataset fo...Geospatial datasets are typically available as distributed collections contributed by various government or commercial providers. Supporting the diverse needs of various users that may be accessing the same dataset for different applications remains a challenging issue. In order to overcome this challenge there is a clear need to develop the capabilities to take into account complicated patterns of preference describing user and/or application particularities, and use these patterns to rank query results in terms of suitability. This paper offers a demonstration on how intelligent systems can assist geospatial queries to improve retrieval accuracy by customizing results based on preference patterns. We outline the particularities of the geospatial domain and present our method and its application.展开更多
A web-based translation method for Chinese organization name is proposed.After ana-lyzing the structure of Chinese organization name,the methods of bilingual query formulation and maximum entropy based translation re-...A web-based translation method for Chinese organization name is proposed.After ana-lyzing the structure of Chinese organization name,the methods of bilingual query formulation and maximum entropy based translation re-ranking are suggested to retrieve the English translation from the web via public search engine.The experiments on Chinese university names demonstrate the validness of this approach.展开更多
For small devices like the PDAs and mobile phones, formulation of relational database queries is not as simple as using conventional devices such as the personal computers and laptops. Due to the restricted size and r...For small devices like the PDAs and mobile phones, formulation of relational database queries is not as simple as using conventional devices such as the personal computers and laptops. Due to the restricted size and resources of these smaller devices, current works mostly limit the queries that can be posed by users by having them predetermined by the developers. This limits the capability of these devices in supporting robust queries. Hence, this paper proposes a universal relation based database querying language which is targeted for small devices. The language allows formulation of relational database queries that uses minimal query terms. The formulation of the language and its structure will be described and usability test results will be presented to support the effectiveness of the language.展开更多
Efficiently querying Description Logic (DL) ontologies is becoming a vital task in various data-intensive DL applications. Considered as a basic service for answering object queries over DL ontologies, instance checki...Efficiently querying Description Logic (DL) ontologies is becoming a vital task in various data-intensive DL applications. Considered as a basic service for answering object queries over DL ontologies, instance checking can be realized by using the most specific concept (MSC) method, which converts instance checking into subsumption problems. This method, however, loses its simplicity and efficiency when applied to large and complex ontologies, as it tends to generate very large MSCs that could lead to intractable reasoning. In this paper, we propose a revision to this MSC method for DL SHI , allowing it to generate much simpler and smaller concepts that are specific enough to answer a given query. With independence between computed MSCs, scalability for query answering can also be achieved by distributing and parallelizing the computations. An empirical evaluation shows the efficacy of our revised MSC method and the significant efficiency achieved when using it for answering object queries.展开更多
Biomedical questions are usually complex and regard several different life science aspects. Numerous valuable and he- terogeneous data are increasingly available to answer such questions. Yet, they are dispersedly sto...Biomedical questions are usually complex and regard several different life science aspects. Numerous valuable and he- terogeneous data are increasingly available to answer such questions. Yet, they are dispersedly stored and difficult to be queried comprehensively. We created a Genomic and Proteomic Data Warehouse (GPDW) that integrates data provided by some of the main bioinformatics databases. It adopts a modular integrated data schema and several metadata to describe the integrated data, their sources and their location in the GPDW. Here, we present the Web application that we developed to enable any user to easily compose queries, although complex, on all data integrated in the GPDW. It is publicly available at http://www.bioinformatics.dei.polimi.it/GPKB/. Through a visual interface, the user is only required to select the types of data to be included in the query and the conditions on their values to be retrieved. Then, the Web application leverages the metadata and modular schema of the GPDW to automatically compose an efficient SQL query, run it on the GPDW and show the extracted requested data, enriched with links to external data sources. Performed tests demonstrated efficiency and usability of the developed Web application, and showed its and GPDW relevance in supporting answering biomedical questions, also difficult.展开更多
In many database applications, ranking queries may reference both text and numeric attributes, where the ranking functions are based on both semantic distances/similarities for text attributes and numeric distances fo...In many database applications, ranking queries may reference both text and numeric attributes, where the ranking functions are based on both semantic distances/similarities for text attributes and numeric distances for numeric attributes. In this paper, we propose a new method for evaluating such type of ranking queries over a relational database. By statistics and training, this method builds a mechanism that combines the semantic and numeric distances, and the mechanism can be used to balance the effects of text attributes and numeric attributes on matching a given query and tuples in database search. The basic idea of the method is to create an index based on WordNet to expand the tuple words semantically for text attributes and on the information of numeric attributes. The candidate results for a query are retrieved by the index and a simple SQL selection statement, and then top-N answers are obtained. The results of extensive experiments indicate that the performance of this new strategy is efficient and effective.展开更多
Purpose:Existing researches of predicting queries with news intents have tried to extract the classification features from external knowledge bases,this paper tries to present how to apply features extracted from quer...Purpose:Existing researches of predicting queries with news intents have tried to extract the classification features from external knowledge bases,this paper tries to present how to apply features extracted from query logs for automatic identification of news queries without using any external resources.Design/methodology/approach:First,we manually labeled 1,220 news queries from Sogou.com.Based on the analysis of these queries,we then identified three features of news queries in terms of query content,time of query occurrence and user click behavior.Afterwards,we used 12 effective features proposed in literature as baseline and conducted experiments based on the support vector machine(SVM)classifier.Finally,we compared the impacts of the features used in this paper on the identification of news queries.Findings:Compared with baseline features,the F-score has been improved from 0.6414 to0.8368 after the use of three newly-identified features,among which the burst point(bst)was the most effective while predicting news queries.In addition,query expression(qes)was more useful than query terms,and among the click behavior-based features,news URL was the most effective one.Research limitations:Analyses based on features extracted from query logs might lead to produce limited results.Instead of short queries,the segmentation tool used in this study has been more widely applied for long texts.Practical implications:The research will be helpful for general-purpose search engines to address search intents for news events.Originality/value:Our approach provides a new and different perspective in recognizing queries with news intent without such large news corpora as blogs or Twitter.展开更多
Kubernetes is an open-source container management tool which automates container deployment,container load balancing and container(de)scaling,including Horizontal Pod Autoscaler(HPA),Vertical Pod Autoscaler(VPA).HPA e...Kubernetes is an open-source container management tool which automates container deployment,container load balancing and container(de)scaling,including Horizontal Pod Autoscaler(HPA),Vertical Pod Autoscaler(VPA).HPA enables flawless operation,interactively scaling the number of resource units,or pods,without downtime.Default Resource Metrics,such as CPU and memory use of host machines and pods,are monitored by Kubernetes.Cloud Computing has emerged as a platform for individuals beside the corporate sector.It provides cost-effective infrastructure,platform and software services in a shared environment.On the other hand,the emergence of industry 4.0 brought new challenges for the adaptability and infusion of cloud computing.As the global work environment is adapting constituents of industry 4.0 in terms of robotics,artificial intelligence and IoT devices,it is becoming eminent that one emerging challenge is collaborative schematics.Provision of such autonomous mechanism that can develop,manage and operationalize digital resources like CoBots to perform tasks in a distributed and collaborative cloud environment for optimized utilization of resources,ensuring schedule completion.Collaborative schematics are also linked with Bigdata management produced by large scale industry 4.0 setups.Different use cases and simulation results showed a significant improvement in Pod CPU utilization,latency,and throughput over Kubernetes environment.展开更多
The query optimizer uses cost-based optimization to create an execution plan with the least cost,which also consumes the least amount of resources.The challenge of query optimization for relational database systems is...The query optimizer uses cost-based optimization to create an execution plan with the least cost,which also consumes the least amount of resources.The challenge of query optimization for relational database systems is a combinatorial optimization problem,which renders exhaustive search impossible as query sizes rise.Increases in CPU performance have surpassed main memory,and disk access speeds in recent decades,allowing data compression to be used—strategies for improving database performance systems.For performance enhancement,compression and query optimization are the two most factors.Compression reduces the volume of data,whereas query optimization minimizes execution time.Compressing the database reduces memory requirement,data takes less time to load into memory,fewer buffer missing occur,and the size of intermediate results is more diminutive.This paper performed query optimization on the graph database in a cloud dew environment by considering,which requires less time to execute a query.The factors compression and query optimization improve the performance of the databases.This research compares the performance of MySQL and Neo4j databases in terms of memory usage and execution time running on cloud dew servers.展开更多
Cloud computingmakes dynamic resource provisioning more accessible.Monitoring a functioning service is crucial,and changes are made when particular criteria are surpassed.This research explores the decentralized multi...Cloud computingmakes dynamic resource provisioning more accessible.Monitoring a functioning service is crucial,and changes are made when particular criteria are surpassed.This research explores the decentralized multi-cloud environment for allocating resources and ensuring the Quality of Service(QoS),estimating the required resources,and modifying allotted resources depending on workload and parallelism due to resources.Resource allocation is a complex challenge due to the versatile service providers and resource providers.The engagement of different service and resource providers needs a cooperation strategy for a sustainable quality of service.The objective of a coherent and rational resource allocation is to attain the quality of service.It also includes identifying critical parameters to develop a resource allocation mechanism.A framework is proposed based on the specified parameters to formulate a resource allocation process in a decentralized multi-cloud environment.The three main parameters of the proposed framework are data accessibility,optimization,and collaboration.Using an optimization technique,these three segments are further divided into subsets for resource allocation and long-term service quality.The CloudSim simulator has been used to validate the suggested framework.Several experiments have been conducted to find the best configurations suited for enhancing collaboration and resource allocation to achieve sustained QoS.The results support the suggested structure for a decentralized multi-cloud environment and the parameters that have been determined.展开更多
基金This research was supported in part by the Nature Science Foundation of China(Nos.62262033,61962029,61762055,62062045 and 62362042)the Jiangxi Provincial Natural Science Foundation of China(Nos.20224BAB202012,20202ACBL202005 and 20202BAB212006)+3 种基金the Science and Technology Research Project of Jiangxi Education Department(Nos.GJJ211815,GJJ2201914 and GJJ201832)the Hubei Natural Science Foundation Innovation and Development Joint Fund Project(No.2022CFD101)Xiangyang High-Tech Key Science and Technology Plan Project(No.2022ABH006848)Hubei Superior and Distinctive Discipline Group of“New Energy Vehicle and Smart Transportation”,the Project of Zhejiang Institute of Mechanical&Electrical Engineering,and the Jiangxi Provincial Social Science Foundation of China(No.23GL52D).
文摘In a cloud environment,outsourced graph data is widely used in companies,enterprises,medical institutions,and so on.Data owners and users can save costs and improve efficiency by storing large amounts of graph data on cloud servers.Servers on cloud platforms usually have some subjective or objective attacks,which make the outsourced graph data in an insecure state.The issue of privacy data protection has become an important obstacle to data sharing and usage.How to query outsourcing graph data safely and effectively has become the focus of research.Adjacency query is a basic and frequently used operation in graph,and it will effectively promote the query range and query ability if multi-keyword fuzzy search can be supported at the same time.This work proposes to protect the privacy information of outsourcing graph data by encryption,mainly studies the problem of multi-keyword fuzzy adjacency query,and puts forward a solution.In our scheme,we use the Bloom filter and encryption mechanism to build a secure index and query token,and adjacency queries are implemented through indexes and query tokens on the cloud server.Our proposed scheme is proved by formal analysis,and the performance and effectiveness of the scheme are illustrated by experimental analysis.The research results of this work will provide solid theoretical and technical support for the further popularization and application of encrypted graph data processing technology.
文摘This article presents an innovative approach to automatic rule discovery for data transformation tasks leveraging XGBoost,a machine learning algorithm renowned for its efficiency and performance.The framework proposed herein utilizes the fusion of diversified feature formats,specifically,metadata,textual,and pattern features.The goal is to enhance the system’s ability to discern and generalize transformation rules fromsource to destination formats in varied contexts.Firstly,the article delves into the methodology for extracting these distinct features from raw data and the pre-processing steps undertaken to prepare the data for the model.Subsequent sections expound on the mechanism of feature optimization using Recursive Feature Elimination(RFE)with linear regression,aiming to retain the most contributive features and eliminate redundant or less significant ones.The core of the research revolves around the deployment of the XGBoostmodel for training,using the prepared and optimized feature sets.The article presents a detailed overview of the mathematical model and algorithmic steps behind this procedure.Finally,the process of rule discovery(prediction phase)by the trained XGBoost model is explained,underscoring its role in real-time,automated data transformations.By employingmachine learning and particularly,the XGBoost model in the context of Business Rule Engine(BRE)data transformation,the article underscores a paradigm shift towardsmore scalable,efficient,and less human-dependent data transformation systems.This research opens doors for further exploration into automated rule discovery systems and their applications in various sectors.
基金The National High Technology Research and Development Program of China(863 Program) (No.2006AA01Z430)
文摘Through the mapping from UMQL ( unified multimedia query language) conditional expressions to UMQA (unified multimedia query algebra) query operations, a translation algorithm from a UMQL query to a UMQA query plan is put forward, which can generate an equivalent UMQA internal query plan for any UMQL query. Then, to improve the execution costs of UMQA query plans effectively, equivalent UMQA translation formulae and general optimization strategies are studied, and an optimization algorithm for UMQA internal query plans is presented. This algorithm uses equivalent UMQA translation formulae to optimize query plans, and makes the optimized query plans accord with the optimization strategies as much as possible. Finally, the logic implementation methods of UMQA plans, i.e., logic implementation methods of UMQA operators, are discussed to obtain useful target data from a muifirnedia database. All of these algorithms are implemented in a UMQL prototype system. Application results show that these query processing techniques are feasible and applicable.
文摘The idea of positional inverted index is exploited for indexing of graph database. The main idea is the use of hashing tables in order to prune a considerable portion of graph database that cannot contain the answer set. These tables are implemented using column-based techniques and are used to store graphs of database, frequent sub-graphs and the neighborhood of nodes. In order to exact checking of remaining graphs, the vertex invariant is used for isomorphism test which can be parallel implemented. The results of evaluation indicate that proposed method outperforms existing methods.
基金Project (No.ABA048) supported by the Natural Science Foundationof Hubei Province,China
文摘The problem of continuously monitoring multiple K-nearest neighbor (K-NN) queries with dynamic object and query dataset is valuable for many location-based applications. A practical method is to partition the data space into grid cells, with both object and query table being indexed by this grid structure, while solving the problem by periodically joining cells of objects with queries having their influence regions intersecting the cells. In the worst case, all cells of objects will be accessed once. Object and query cache strategies are proposed to further reduce the I/O cost. With object cache strategy, queries remaining static in current processing cycle seldom need I/O cost, they can be returned quickly. The main I/O cost comes from moving queries, the query cache strategy is used to restrict their search-regions, which uses current results of queries in the main memory buffer. The queries can share not only the accessing of object pages, but also their influence regions. Theoretical analysis of the expected I/O cost is presented, with the I/O cost being about 40% that of the SEA-CNN method in the experiment results.
基金Supported by the National Natural Sciences Foundation of China(60373066 ,60425206 ,90412003) , National Grand Fundamental Research 973 Pro-gramof China(2002CB312000) , National Research Foundation for the Doctoral Pro-gramof Higher Education of China (20020286004)
文摘This paper proposes a checking method based on mutual instances and discusses three key problems in the method: how to deal with mistakes in the mutual instances and how to deal with too many or too few mutual instances. It provides the checking based on the weighted mutual instances considering fault tolerance, gives a way to partition the large-scale mutual instances, and proposes a process greatly reducing the manual annotation work to get more mutual instances. Intension annotation that improves the checking method is also discussed. The method is practical and effective to check subsumption relations between concept queries in different ontologies based on mutual instances.
基金supported by the National High-Tech Research and Development Plan of China (Grant No.2005AA4Z3030)
文摘Finding all occurrences of a twig query in an XML database is a core operation for efficient evaluation of XML queries. It is important to effiectively handle twig queries with wildcards. In this paper, a novel path-partitioned encoding scheme is proposed for XML documents to capture paths of all elements, and a twig query is modeled as an XPattern extended from tree pattern. After definition, simplification, normalization, verification and initialization of the XPattern, both work sets and a join plan are generated. According to these measures, an effiective algorithm to answer for a twig query, called DMTwig, is designed without unnecessary elements and invalid structural joins. The algorithm can adaptively deal with twig queries with branch ([ ]), child edge (/), descendant edge (//), and wildcard (*) synthetically. We show that path-partitioned encoding scheme and XPattern guarantee the I/O and CPU optimality for twig queries. Experiments on representative data set indicate that the proposed solution performs significantly.
基金supported by the National Natural Science Foundation of China(No.61772366)the Natural Science Foundation of Shanghai(No.17ZR1445900)the program of Further Accelerating the Development of Chinese Medicine Three Year Action of Shanghai(2014-2016)ZY3-CCCX-3-6002
文摘Skyline query processing has recently received a lot of attention in database and data mining communities. However, most existing algorithms consider how to efficiently process skyline queries from base tables. Obviously, when the data size and the number of skyline queries increase, the time cost of skyline queries will increase exponentially, which will seriously influence the query efficiency. Motivated by the above, in this paper, we consider improving the query efficiency via skyline views and propose a cost-based algorithm(abbr. CA) to efficiently select the optimal set of skyline views for storage. The CA algorithm mainly includes two phases:(i) reduce the skyline views selection to the minimum steiner tree problem and obtain the approximate optimal set AOS of skyline views, and(ii) adjust AOS and produce the final optimal set FOS of skyline views based on the simulated annealing. Moreover, in order to improve the extendibility of the CA algorithm, we implement it based on the map/reduce distributed computation model in cloud computing environments. The detailed theoretical analyses and extensive experiments demonstrate that the CA algorithm is both efficient and effective.
文摘Moving object database (MOD) engine is the foundation of Location-Based Service (LBS) information systems. Continuous queries are important in spatial-temporal reasoning of a MOD. The communication costs were the bottleneck for improving query efficiency until the rectangular safe region algorithm partly solved this problem. However, this algorithm can be further improved, as we demonstrate with the dynamic interval based continuous queries algorithm on moving objects. Two components, circular safe region and dynamic intervals were adopted by our algorithm. Theoretical proof and experimental results show that our algorithm substantially outperforms the traditional periodic monitoring and the rectangular safe region algorithm in terms of monitoring accuracy, reducing communication costs and server CPU time. Moreover, in our algorithm, the mobile terminals do not need to have any computational ability.
文摘The k-median problem has attracted a number of researchers. However,few of them have considered both the dynamic environment and the issue of accuracy. In this paper,a new type of query is studied,called continuous median monitoring (CMM) query. It considers the k-median problem under dynamic environment with an accuracy guarantee. A continuous group nearest neighbor based (CGB) algorithm and an average distance medoid (ADM) algorithm are proposed to solve the CMM problem. ADM is a hill climbing schemed algorithm and achieves a rapid converging speed by checking only qualified candidates. Experiments show that ADM is more efficient than CGB and outperforms the classical PAM (partitioning around medoids) and CLARANS (clustering large applications based on randomized search) algorithms with various parameter settings.
文摘Geospatial datasets are typically available as distributed collections contributed by various government or commercial providers. Supporting the diverse needs of various users that may be accessing the same dataset for different applications remains a challenging issue. In order to overcome this challenge there is a clear need to develop the capabilities to take into account complicated patterns of preference describing user and/or application particularities, and use these patterns to rank query results in terms of suitability. This paper offers a demonstration on how intelligent systems can assist geospatial queries to improve retrieval accuracy by customizing results based on preference patterns. We outline the particularities of the geospatial domain and present our method and its application.
基金Supported by National Natural Science Foundation of China (No.60736044 & 60773066)the Post Doctorial Funds of Heilongjiang
文摘A web-based translation method for Chinese organization name is proposed.After ana-lyzing the structure of Chinese organization name,the methods of bilingual query formulation and maximum entropy based translation re-ranking are suggested to retrieve the English translation from the web via public search engine.The experiments on Chinese university names demonstrate the validness of this approach.
文摘For small devices like the PDAs and mobile phones, formulation of relational database queries is not as simple as using conventional devices such as the personal computers and laptops. Due to the restricted size and resources of these smaller devices, current works mostly limit the queries that can be posed by users by having them predetermined by the developers. This limits the capability of these devices in supporting robust queries. Hence, this paper proposes a universal relation based database querying language which is targeted for small devices. The language allows formulation of relational database queries that uses minimal query terms. The formulation of the language and its structure will be described and usability test results will be presented to support the effectiveness of the language.
文摘Efficiently querying Description Logic (DL) ontologies is becoming a vital task in various data-intensive DL applications. Considered as a basic service for answering object queries over DL ontologies, instance checking can be realized by using the most specific concept (MSC) method, which converts instance checking into subsumption problems. This method, however, loses its simplicity and efficiency when applied to large and complex ontologies, as it tends to generate very large MSCs that could lead to intractable reasoning. In this paper, we propose a revision to this MSC method for DL SHI , allowing it to generate much simpler and smaller concepts that are specific enough to answer a given query. With independence between computed MSCs, scalability for query answering can also be achieved by distributing and parallelizing the computations. An empirical evaluation shows the efficacy of our revised MSC method and the significant efficiency achieved when using it for answering object queries.
文摘Biomedical questions are usually complex and regard several different life science aspects. Numerous valuable and he- terogeneous data are increasingly available to answer such questions. Yet, they are dispersedly stored and difficult to be queried comprehensively. We created a Genomic and Proteomic Data Warehouse (GPDW) that integrates data provided by some of the main bioinformatics databases. It adopts a modular integrated data schema and several metadata to describe the integrated data, their sources and their location in the GPDW. Here, we present the Web application that we developed to enable any user to easily compose queries, although complex, on all data integrated in the GPDW. It is publicly available at http://www.bioinformatics.dei.polimi.it/GPKB/. Through a visual interface, the user is only required to select the types of data to be included in the query and the conditions on their values to be retrieved. Then, the Web application leverages the metadata and modular schema of the GPDW to automatically compose an efficient SQL query, run it on the GPDW and show the extracted requested data, enriched with links to external data sources. Performed tests demonstrated efficiency and usability of the developed Web application, and showed its and GPDW relevance in supporting answering biomedical questions, also difficult.
文摘In many database applications, ranking queries may reference both text and numeric attributes, where the ranking functions are based on both semantic distances/similarities for text attributes and numeric distances for numeric attributes. In this paper, we propose a new method for evaluating such type of ranking queries over a relational database. By statistics and training, this method builds a mechanism that combines the semantic and numeric distances, and the mechanism can be used to balance the effects of text attributes and numeric attributes on matching a given query and tuples in database search. The basic idea of the method is to create an index based on WordNet to expand the tuple words semantically for text attributes and on the information of numeric attributes. The candidate results for a query are retrieved by the index and a simple SQL selection statement, and then top-N answers are obtained. The results of extensive experiments indicate that the performance of this new strategy is efficient and effective.
基金supported by the Social Science Planning Foundation of Chongqing(Grant No.:2011QNCB28)
文摘Purpose:Existing researches of predicting queries with news intents have tried to extract the classification features from external knowledge bases,this paper tries to present how to apply features extracted from query logs for automatic identification of news queries without using any external resources.Design/methodology/approach:First,we manually labeled 1,220 news queries from Sogou.com.Based on the analysis of these queries,we then identified three features of news queries in terms of query content,time of query occurrence and user click behavior.Afterwards,we used 12 effective features proposed in literature as baseline and conducted experiments based on the support vector machine(SVM)classifier.Finally,we compared the impacts of the features used in this paper on the identification of news queries.Findings:Compared with baseline features,the F-score has been improved from 0.6414 to0.8368 after the use of three newly-identified features,among which the burst point(bst)was the most effective while predicting news queries.In addition,query expression(qes)was more useful than query terms,and among the click behavior-based features,news URL was the most effective one.Research limitations:Analyses based on features extracted from query logs might lead to produce limited results.Instead of short queries,the segmentation tool used in this study has been more widely applied for long texts.Practical implications:The research will be helpful for general-purpose search engines to address search intents for news events.Originality/value:Our approach provides a new and different perspective in recognizing queries with news intent without such large news corpora as blogs or Twitter.
文摘Kubernetes is an open-source container management tool which automates container deployment,container load balancing and container(de)scaling,including Horizontal Pod Autoscaler(HPA),Vertical Pod Autoscaler(VPA).HPA enables flawless operation,interactively scaling the number of resource units,or pods,without downtime.Default Resource Metrics,such as CPU and memory use of host machines and pods,are monitored by Kubernetes.Cloud Computing has emerged as a platform for individuals beside the corporate sector.It provides cost-effective infrastructure,platform and software services in a shared environment.On the other hand,the emergence of industry 4.0 brought new challenges for the adaptability and infusion of cloud computing.As the global work environment is adapting constituents of industry 4.0 in terms of robotics,artificial intelligence and IoT devices,it is becoming eminent that one emerging challenge is collaborative schematics.Provision of such autonomous mechanism that can develop,manage and operationalize digital resources like CoBots to perform tasks in a distributed and collaborative cloud environment for optimized utilization of resources,ensuring schedule completion.Collaborative schematics are also linked with Bigdata management produced by large scale industry 4.0 setups.Different use cases and simulation results showed a significant improvement in Pod CPU utilization,latency,and throughput over Kubernetes environment.
文摘The query optimizer uses cost-based optimization to create an execution plan with the least cost,which also consumes the least amount of resources.The challenge of query optimization for relational database systems is a combinatorial optimization problem,which renders exhaustive search impossible as query sizes rise.Increases in CPU performance have surpassed main memory,and disk access speeds in recent decades,allowing data compression to be used—strategies for improving database performance systems.For performance enhancement,compression and query optimization are the two most factors.Compression reduces the volume of data,whereas query optimization minimizes execution time.Compressing the database reduces memory requirement,data takes less time to load into memory,fewer buffer missing occur,and the size of intermediate results is more diminutive.This paper performed query optimization on the graph database in a cloud dew environment by considering,which requires less time to execute a query.The factors compression and query optimization improve the performance of the databases.This research compares the performance of MySQL and Neo4j databases in terms of memory usage and execution time running on cloud dew servers.
文摘Cloud computingmakes dynamic resource provisioning more accessible.Monitoring a functioning service is crucial,and changes are made when particular criteria are surpassed.This research explores the decentralized multi-cloud environment for allocating resources and ensuring the Quality of Service(QoS),estimating the required resources,and modifying allotted resources depending on workload and parallelism due to resources.Resource allocation is a complex challenge due to the versatile service providers and resource providers.The engagement of different service and resource providers needs a cooperation strategy for a sustainable quality of service.The objective of a coherent and rational resource allocation is to attain the quality of service.It also includes identifying critical parameters to develop a resource allocation mechanism.A framework is proposed based on the specified parameters to formulate a resource allocation process in a decentralized multi-cloud environment.The three main parameters of the proposed framework are data accessibility,optimization,and collaboration.Using an optimization technique,these three segments are further divided into subsets for resource allocation and long-term service quality.The CloudSim simulator has been used to validate the suggested framework.Several experiments have been conducted to find the best configurations suited for enhancing collaboration and resource allocation to achieve sustained QoS.The results support the suggested structure for a decentralized multi-cloud environment and the parameters that have been determined.