The query processing in distributed database management systems(DBMS)faces more challenges,such as more operators,and more factors in cost models and meta-data,than that in a single-node DMBS,in which query optimizati...The query processing in distributed database management systems(DBMS)faces more challenges,such as more operators,and more factors in cost models and meta-data,than that in a single-node DMBS,in which query optimization is already an NP-hard problem.Learned query optimizers(mainly in the single-node DBMS)receive attention due to its capability to capture data distributions and flexible ways to avoid hard-craft rules in refinement and adaptation to new hardware.In this paper,we focus on extensions of learned query optimizers to distributed DBMSs.Specifically,we propose one possible but general architecture of the learned query optimizer in the distributed context and highlight differences from the learned optimizer in the single-node ones.In addition,we discuss the challenges and possible solutions.展开更多
The query optimizer uses cost-based optimization to create an execution plan with the least cost,which also consumes the least amount of resources.The challenge of query optimization for relational database systems is...The query optimizer uses cost-based optimization to create an execution plan with the least cost,which also consumes the least amount of resources.The challenge of query optimization for relational database systems is a combinatorial optimization problem,which renders exhaustive search impossible as query sizes rise.Increases in CPU performance have surpassed main memory,and disk access speeds in recent decades,allowing data compression to be used—strategies for improving database performance systems.For performance enhancement,compression and query optimization are the two most factors.Compression reduces the volume of data,whereas query optimization minimizes execution time.Compressing the database reduces memory requirement,data takes less time to load into memory,fewer buffer missing occur,and the size of intermediate results is more diminutive.This paper performed query optimization on the graph database in a cloud dew environment by considering,which requires less time to execute a query.The factors compression and query optimization improve the performance of the databases.This research compares the performance of MySQL and Neo4j databases in terms of memory usage and execution time running on cloud dew servers.展开更多
Cloud computingmakes dynamic resource provisioning more accessible.Monitoring a functioning service is crucial,and changes are made when particular criteria are surpassed.This research explores the decentralized multi...Cloud computingmakes dynamic resource provisioning more accessible.Monitoring a functioning service is crucial,and changes are made when particular criteria are surpassed.This research explores the decentralized multi-cloud environment for allocating resources and ensuring the Quality of Service(QoS),estimating the required resources,and modifying allotted resources depending on workload and parallelism due to resources.Resource allocation is a complex challenge due to the versatile service providers and resource providers.The engagement of different service and resource providers needs a cooperation strategy for a sustainable quality of service.The objective of a coherent and rational resource allocation is to attain the quality of service.It also includes identifying critical parameters to develop a resource allocation mechanism.A framework is proposed based on the specified parameters to formulate a resource allocation process in a decentralized multi-cloud environment.The three main parameters of the proposed framework are data accessibility,optimization,and collaboration.Using an optimization technique,these three segments are further divided into subsets for resource allocation and long-term service quality.The CloudSim simulator has been used to validate the suggested framework.Several experiments have been conducted to find the best configurations suited for enhancing collaboration and resource allocation to achieve sustained QoS.The results support the suggested structure for a decentralized multi-cloud environment and the parameters that have been determined.展开更多
Dynamic programming(DP) is an effective query optimization approach to select an appropriate join order for relational database management system(RDBMS) in multi-table joins. This method was extended and made availabl...Dynamic programming(DP) is an effective query optimization approach to select an appropriate join order for relational database management system(RDBMS) in multi-table joins. This method was extended and made available in distributed DBMS(D-DBMS). The structure of this optimal solution was firstly characterized according to the distributing status of tables and data, and then the recurrence relations between a problem and its sub-problems were recursively defined. DP in D-DBMS has the same time-complexity with that in centralized DBMS, while it has the capability to solve a much more sophisticated optimal problem of multi-table join in D-DBMS. The effectiveness of this optimal strategy has been proved by experiments.展开更多
The rigid structure of the traditional relational database leads to data redundancy,which seriously affects the efficiency of the data query and cannot effectively manage massive data.To solve this problem,we use dist...The rigid structure of the traditional relational database leads to data redundancy,which seriously affects the efficiency of the data query and cannot effectively manage massive data.To solve this problem,we use distributed storage and parallel computing technology to query RDF data.In order to achieve efficient storage and retrieval of large-scale RDF data,we combine the respective advantage of the storage model of the relational database and the distributed query.To overcome the disadvantages of storing and querying RDF data,we design and implement a breadth-first path search algorithm based on the keyword query on a distributed platform.We conduct the LUBM query statements respectively with the selected data sets.In experiments,we compare query response time in different conditions to evaluate the feasibility and correctness of our approaches.The results show that the proposed scheme can reduce the storage cost and improve query efficiency.展开更多
Kubernetes is an open-source container management tool which automates container deployment,container load balancing and container(de)scaling,including Horizontal Pod Autoscaler(HPA),Vertical Pod Autoscaler(VPA).HPA e...Kubernetes is an open-source container management tool which automates container deployment,container load balancing and container(de)scaling,including Horizontal Pod Autoscaler(HPA),Vertical Pod Autoscaler(VPA).HPA enables flawless operation,interactively scaling the number of resource units,or pods,without downtime.Default Resource Metrics,such as CPU and memory use of host machines and pods,are monitored by Kubernetes.Cloud Computing has emerged as a platform for individuals beside the corporate sector.It provides cost-effective infrastructure,platform and software services in a shared environment.On the other hand,the emergence of industry 4.0 brought new challenges for the adaptability and infusion of cloud computing.As the global work environment is adapting constituents of industry 4.0 in terms of robotics,artificial intelligence and IoT devices,it is becoming eminent that one emerging challenge is collaborative schematics.Provision of such autonomous mechanism that can develop,manage and operationalize digital resources like CoBots to perform tasks in a distributed and collaborative cloud environment for optimized utilization of resources,ensuring schedule completion.Collaborative schematics are also linked with Bigdata management produced by large scale industry 4.0 setups.Different use cases and simulation results showed a significant improvement in Pod CPU utilization,latency,and throughput over Kubernetes environment.展开更多
Quality traceability plays an essential role in assembling and welding offshore platform blocks.The improvement of the welding quality traceability system is conducive to improving the durability of the offshore platf...Quality traceability plays an essential role in assembling and welding offshore platform blocks.The improvement of the welding quality traceability system is conducive to improving the durability of the offshore platform and the process level of the offshore industry.Currently,qualitymanagement remains in the era of primary information,and there is a lack of effective tracking and recording of welding quality data.When welding defects are encountered,it is difficult to rapidly and accurately determine the root cause of the problem from various complexities and scattered quality data.In this paper,a composite welding quality traceability model for offshore platform block construction process is proposed,it contains the quality early-warning method based on long short-term memory and quality data backtracking query optimization algorithm.By fulfilling the training of the early-warning model and the implementation of the query optimization algorithm,the quality traceability model has the ability to assist enterprises in realizing the rapid identification and positioning of quality problems.Furthermore,the model and the quality traceability algorithm are checked by cases in actual working conditions.Verification analyses suggest that the proposed early-warningmodel for welding quality and the algorithmfor optimizing backtracking requests are effective and can be applied to the actual construction process.展开更多
We developed a parallel object relational DBMS named PORLES. It uses BSP model as its parallel computing model, and monoid calculus as its basis of data model. In this paper, we introduce its data model, parallel que...We developed a parallel object relational DBMS named PORLES. It uses BSP model as its parallel computing model, and monoid calculus as its basis of data model. In this paper, we introduce its data model, parallel query optimization, transaction processing system and parallel access method in detail.展开更多
An extent join to compute path expressions containing parent-children andancestor-descendent operations and two path expression optimization rules, path-shortening andpath-complementing, are presented in this paper. P...An extent join to compute path expressions containing parent-children andancestor-descendent operations and two path expression optimization rules, path-shortening andpath-complementing, are presented in this paper. Path-shortening reduces the number of joins byshortening the path while path-complementing optimizes the path execution by using an equivalentcomplementary path expression to compute the original one. Experimental results show that thealgorithms proposed are more efficient than traditional algorithms.展开更多
Fundamentally, semantic grid database is about bringing globally distributed databases together in order to coordinate resource sharing and problem solving in which information is given well-defined meaning, and DartG...Fundamentally, semantic grid database is about bringing globally distributed databases together in order to coordinate resource sharing and problem solving in which information is given well-defined meaning, and DartGrid II is the implemented database gird system whose goal is to provide a semantic solution for integrating database resources on the Web. Although many algorithms have been proposed for optimizing query-processing in order to minimize costs and/or response time, associated with obtaining the answer to query in a distributed database system, database grid query optimization problem is fundamentally different from traditional distributed query optimization. These differences are shown to be the consequences of autonomy and heterogeneity of database nodes in database grid. Therefore, more challenges have arisen for query optimization in database grid than traditional distributed database. Following this observation, the design of a query optimizer in DartGrid II is presented, and a heuristic, dynamic and parallel query optimization approach to processing query in database grid is proposed. A set of semantic tools supporting relational database integration and semantic-based information browsing has also been implemented to realize the above vision.展开更多
As the popularity of XML (extensible Markup Language) keeps growing rapidly,the management of XML compliant structured-document databases has become a very interesting andcompelling research area. Query optimization f...As the popularity of XML (extensible Markup Language) keeps growing rapidly,the management of XML compliant structured-document databases has become a very interesting andcompelling research area. Query optimization for XML structured-documents stands out as one of themost challenging research issues in this area because of the much enlarged optimization (search)space, which is a consequence of the intrinsic complexity of the underlying data model of XML data.We therefore propose to apply deterministic transformations on query expressions to mostaggressively prune the search space and fast achieve a sufficiently improved alternative (if not theoptimal) for each incoming query expression. This idea is not just exciting but practicallyattainable. This paper first provides an overview of our optimization strategy, and then focuses onthe key implementation issues of our rule-based transformation system for XML query optimization ina database environment. The performance results we obtained from experimentation show that ourapproach is a valid and effective one.展开更多
Our study introduces a novel distributed query plan refinement phase in an enhanced architecture of distributed query processing engine (DQPE). Query plan refinement generates potentially efficient distributed query...Our study introduces a novel distributed query plan refinement phase in an enhanced architecture of distributed query processing engine (DQPE). Query plan refinement generates potentially efficient distributed query plan by reusable aggregate query shipping (RAQS) approach. The approach improves response time at the cost of pre-processing time. If the overheads could not be compensated by query results reusage, RAQS is no more favorable. Therefore a globM cost estimation model is employed to get proper operators: RR_Agg, R_Agg, or R_Scan. For the purpose of reusing results of queries with aggregate function in distributed query processing, a multi-level hybrid view caching (HVC) scheme is introduced. The scheme retains the advantages of partial match and aggregate query results caching. By our solution, evaluations with distributed TPC-H queries show significant improvement on average response time.展开更多
Semantic Web has emerged to make web content machine-readable,and with the rapid increase in the number of web pages,its importance has increased.Resource description framework(RDF)is a special data graph format where...Semantic Web has emerged to make web content machine-readable,and with the rapid increase in the number of web pages,its importance has increased.Resource description framework(RDF)is a special data graph format where Semantic Web data are stored and it can be queried by SPARQL query language.The challenge is to find the optimal query order that results in the shortest period of time.In this paper,the discrete Artificial Bee Colony(dABCSPARQL)algorithm is proposed,based on a novel heuristic approach,namely reordering SPARQL queries.The processing time of queries with different shapes and sizes is minimized using the dABCSPARQL algorithm.The performance of the proposed method is evaluated on chain,star,cyclic,and chain-star queries of different sizes from the Lehigh University Benchmark(LUBM)dataset.The results obtained by the proposed method are compared with those of ARQ(a SPARQL processor for Jena)query engine,the Ant System,the Elitist Ant System,and MAX-MIN Ant System algorithms.The experiments demonstrate that the proposed method significantly reduces the processing time,and in most queries,the reduction rate is higher compared with other optimization methods.展开更多
A systematic, efficient compilation method for query evaluation of DeductiveDatabases (DeDB) is proposed in this paper. In order to eliminate redundancyand to minimize the potentially relevant facts, which are two key...A systematic, efficient compilation method for query evaluation of DeductiveDatabases (DeDB) is proposed in this paper. In order to eliminate redundancyand to minimize the potentially relevant facts, which are two key issues to theefficiency of a DeDB, the compilation process is decomposed into two phases.The first is the pre-compilation phase, which is responsible for the minimiza-tion of the potentially relevant facts. The second, which we refer to as thegeneral compilation phase, is responsible for the elimination of redundancy.The rule/goal graph devised by J. D. Ullman is appropriately extended andused as a uniform formalism. Two general algorithms corresponding to the twophases respectively are described intuitively and formally展开更多
Optimal location query in road networks is a basic operation in the location intelligence applications.Given a set of clients and servers on a road network,the purpose of optimal location query is to obtain a location...Optimal location query in road networks is a basic operation in the location intelligence applications.Given a set of clients and servers on a road network,the purpose of optimal location query is to obtain a location for a new server,so that a certain objective function calculated based on the locations of clients and servers is optimal.Existing works assume no labels for servers and that a client only visits the nearest server.These assumptions are not realistic and it renders the existing work not useful in many cases.In this paper,we relax these assumptions and consider the k nearest neighbours(KNN)of clients.We introduce the problem of KNN-based optimal location query(KOLQ)which considers the k nearest servers of clients and labeled servers.We also introduce a variant problem called relocation KOLQ(RKOLQ)which aims at relocating an existing server to an optimal location.Two main analysis algorithms are proposed for these problems.Extensive experiments on the real road networks illustrate the efficiency of our proposed solutions.展开更多
Static optimization of logical queries is, in substance, to move selections down as far as possible in evaluating logical queries. This paper extends Ullman's RGG (Rule/Goal Graph) and introduces P- graph, with wh...Static optimization of logical queries is, in substance, to move selections down as far as possible in evaluating logical queries. This paper extends Ullman's RGG (Rule/Goal Graph) and introduces P- graph, with which a wide range of recursive logical queries can be statically optimized top-down and evaluated bottom-up, some of which are usually optimized by dynamic approaches. The paper also shows that for some logical queries the complexity of pushing selections down and computing bottom-up is related to the complexity of base relation in the queries.展开更多
We present the first efficient sound and complete algorithm (i.e., AOMSSQ) for optimizing multiple subspace skyline queries simultaneously in this paper. We first identify three performance problems of the na/ve app...We present the first efficient sound and complete algorithm (i.e., AOMSSQ) for optimizing multiple subspace skyline queries simultaneously in this paper. We first identify three performance problems of the na/ve approach (i.e., SUBSKY) which can be used in processing arbitrary single-subspace skyline query. Then we propose a cell-dominance computation algorithm (i.e., CDCA) to efficiently overcome the drawbacks of SUBSKY. Specially, a novel pruning technique is used in CDCA to dramatically decrease the query time. Finally, based on the CDCA algorithm and the share mechanism between subspaces, we present and discuss the AOMSSQ algorithm and prove it sound and complete. We also present detailed theoretical analyses and extensive experiments that demonstrate our algorithms are both efficient and effective.展开更多
Data quality is important in many data-driven applications, such as decision making, data analysis, and data mining. Recent studies focus on data cleaning techniques by deleting or repairing the dirty data, which may ...Data quality is important in many data-driven applications, such as decision making, data analysis, and data mining. Recent studies focus on data cleaning techniques by deleting or repairing the dirty data, which may cause information loss and bring new inconsistencies. To avoid these problems, we propose EntityManager, a general system to manage dirty data without data cleaning. This system takes real-world entity as the basic storage unit and retrieves query results according to the quality requirement of users. The system is able to handle all kinds of inconsistencies recognized by entity resolution. We elaborate the EntityManager system, covering its architecture, data model, and query processing techniques. To process queries efficiently, our system adopts novel indices, similarity operator and query optimization techniques. Finally, we verify the efficiency and effectiveness of this system and present future research challenges.展开更多
In recent years,Apache Spark has become the de facto standard for big data processing.SparkSQL is a module offering support for relational analysis on Spark with Structured Query Language(SQL).SparkSQL provides conven...In recent years,Apache Spark has become the de facto standard for big data processing.SparkSQL is a module offering support for relational analysis on Spark with Structured Query Language(SQL).SparkSQL provides convenient data processing interfaces.Despite its efficient optimizer,SparkSQL still suffers from the inefficiency of Spark resulting from Java virtual machine and the unnecessary data serialization and deserialization.Adopting native languages such as C++could help to avoid such bottlenecks.Benefiting from a bare-metal runtime environment and template usage,systems with C++interfaces usually achieve superior performance.However,the complexity of native languages also increases the required programming and debugging efforts.In this work,we present LotusSQL,an engine to provide SQL support for dataset abstraction on a native backend Lotus.We employ a convenient SQL processing framework to deal with frontend jobs.Advanced query optimization technologies are added to improve the quality of execution plans.Above the storage design and user interface of the compute engine,LotusSQL implements a set of structured dataset operations with high efficiency and integrates them with the frontend.Evaluation results show that LotusSQL achieves a speedup of up to 9 in certain queries and outperforms Spark SQL in a standard query benchmark by more than 2 on average.展开更多
Spatial selectivity estimation is crucial to choose the cheapest execution plan for a given query in a query optimizer.This article proposes an accurate spatial selectivity estimation method based on the cumulative de...Spatial selectivity estimation is crucial to choose the cheapest execution plan for a given query in a query optimizer.This article proposes an accurate spatial selectivity estimation method based on the cumulative density(CD)histograms,which can deal with any arbitrary spatial query window.In this method,the selectivity can be estimated in original logic of the CD histogram,after the four corner values of a query window have been accurately interpolated on the continuous surface of the elevation histogram.For the interpolation of any corner points,we first identify the cells that can affect the value of point(x,y)in the CD histogram.These cells can be categorized into two classes:ones within the range from(0,0)to(x,y)and the other overlapping the range from(0,0)to(x,y).The values of the former class can be used directly,whereas we revise the values of any cells falling in the latter class by the number of vertices in the corresponding cell and the area ratio covered by the range from(0,0)to(x,y).This revision makes the estimation method more accurate.The CD histograms and estimation method have been implemented in INGRES.Experiment results show that the method can accurately estimate the selectivity of arbitrary query windows and can help the optimizer choose a cheaper query plan.展开更多
基金partially supported by NSFC under Grant Nos.61832001 and 62272008ZTE Industry-University-Institute Fund Project。
文摘The query processing in distributed database management systems(DBMS)faces more challenges,such as more operators,and more factors in cost models and meta-data,than that in a single-node DMBS,in which query optimization is already an NP-hard problem.Learned query optimizers(mainly in the single-node DBMS)receive attention due to its capability to capture data distributions and flexible ways to avoid hard-craft rules in refinement and adaptation to new hardware.In this paper,we focus on extensions of learned query optimizers to distributed DBMSs.Specifically,we propose one possible but general architecture of the learned query optimizer in the distributed context and highlight differences from the learned optimizer in the single-node ones.In addition,we discuss the challenges and possible solutions.
文摘The query optimizer uses cost-based optimization to create an execution plan with the least cost,which also consumes the least amount of resources.The challenge of query optimization for relational database systems is a combinatorial optimization problem,which renders exhaustive search impossible as query sizes rise.Increases in CPU performance have surpassed main memory,and disk access speeds in recent decades,allowing data compression to be used—strategies for improving database performance systems.For performance enhancement,compression and query optimization are the two most factors.Compression reduces the volume of data,whereas query optimization minimizes execution time.Compressing the database reduces memory requirement,data takes less time to load into memory,fewer buffer missing occur,and the size of intermediate results is more diminutive.This paper performed query optimization on the graph database in a cloud dew environment by considering,which requires less time to execute a query.The factors compression and query optimization improve the performance of the databases.This research compares the performance of MySQL and Neo4j databases in terms of memory usage and execution time running on cloud dew servers.
文摘Cloud computingmakes dynamic resource provisioning more accessible.Monitoring a functioning service is crucial,and changes are made when particular criteria are surpassed.This research explores the decentralized multi-cloud environment for allocating resources and ensuring the Quality of Service(QoS),estimating the required resources,and modifying allotted resources depending on workload and parallelism due to resources.Resource allocation is a complex challenge due to the versatile service providers and resource providers.The engagement of different service and resource providers needs a cooperation strategy for a sustainable quality of service.The objective of a coherent and rational resource allocation is to attain the quality of service.It also includes identifying critical parameters to develop a resource allocation mechanism.A framework is proposed based on the specified parameters to formulate a resource allocation process in a decentralized multi-cloud environment.The three main parameters of the proposed framework are data accessibility,optimization,and collaboration.Using an optimization technique,these three segments are further divided into subsets for resource allocation and long-term service quality.The CloudSim simulator has been used to validate the suggested framework.Several experiments have been conducted to find the best configurations suited for enhancing collaboration and resource allocation to achieve sustained QoS.The results support the suggested structure for a decentralized multi-cloud environment and the parameters that have been determined.
文摘Dynamic programming(DP) is an effective query optimization approach to select an appropriate join order for relational database management system(RDBMS) in multi-table joins. This method was extended and made available in distributed DBMS(D-DBMS). The structure of this optimal solution was firstly characterized according to the distributing status of tables and data, and then the recurrence relations between a problem and its sub-problems were recursively defined. DP in D-DBMS has the same time-complexity with that in centralized DBMS, while it has the capability to solve a much more sophisticated optimal problem of multi-table join in D-DBMS. The effectiveness of this optimal strategy has been proved by experiments.
基金This work is supported in part by National Natural Science Foundation of China(61728204)Innovation Funding(NJ20160028,NT2018027,NT2018028,NS2018057)+1 种基金Aeronautical Science Foundation of China(2016551500)State Key Laboratory for smart grid protection and operation control Foundation,Association of Chinese Graduate Education(ACGE).
文摘The rigid structure of the traditional relational database leads to data redundancy,which seriously affects the efficiency of the data query and cannot effectively manage massive data.To solve this problem,we use distributed storage and parallel computing technology to query RDF data.In order to achieve efficient storage and retrieval of large-scale RDF data,we combine the respective advantage of the storage model of the relational database and the distributed query.To overcome the disadvantages of storing and querying RDF data,we design and implement a breadth-first path search algorithm based on the keyword query on a distributed platform.We conduct the LUBM query statements respectively with the selected data sets.In experiments,we compare query response time in different conditions to evaluate the feasibility and correctness of our approaches.The results show that the proposed scheme can reduce the storage cost and improve query efficiency.
文摘Kubernetes is an open-source container management tool which automates container deployment,container load balancing and container(de)scaling,including Horizontal Pod Autoscaler(HPA),Vertical Pod Autoscaler(VPA).HPA enables flawless operation,interactively scaling the number of resource units,or pods,without downtime.Default Resource Metrics,such as CPU and memory use of host machines and pods,are monitored by Kubernetes.Cloud Computing has emerged as a platform for individuals beside the corporate sector.It provides cost-effective infrastructure,platform and software services in a shared environment.On the other hand,the emergence of industry 4.0 brought new challenges for the adaptability and infusion of cloud computing.As the global work environment is adapting constituents of industry 4.0 in terms of robotics,artificial intelligence and IoT devices,it is becoming eminent that one emerging challenge is collaborative schematics.Provision of such autonomous mechanism that can develop,manage and operationalize digital resources like CoBots to perform tasks in a distributed and collaborative cloud environment for optimized utilization of resources,ensuring schedule completion.Collaborative schematics are also linked with Bigdata management produced by large scale industry 4.0 setups.Different use cases and simulation results showed a significant improvement in Pod CPU utilization,latency,and throughput over Kubernetes environment.
基金funded by Ministry of Industry and Information Technology of the People’s Republic of China[Grant No.2018473].
文摘Quality traceability plays an essential role in assembling and welding offshore platform blocks.The improvement of the welding quality traceability system is conducive to improving the durability of the offshore platform and the process level of the offshore industry.Currently,qualitymanagement remains in the era of primary information,and there is a lack of effective tracking and recording of welding quality data.When welding defects are encountered,it is difficult to rapidly and accurately determine the root cause of the problem from various complexities and scattered quality data.In this paper,a composite welding quality traceability model for offshore platform block construction process is proposed,it contains the quality early-warning method based on long short-term memory and quality data backtracking query optimization algorithm.By fulfilling the training of the early-warning model and the implementation of the query optimization algorithm,the quality traceability model has the ability to assist enterprises in realizing the rapid identification and positioning of quality problems.Furthermore,the model and the quality traceability algorithm are checked by cases in actual working conditions.Verification analyses suggest that the proposed early-warningmodel for welding quality and the algorithmfor optimizing backtracking requests are effective and can be applied to the actual construction process.
文摘We developed a parallel object relational DBMS named PORLES. It uses BSP model as its parallel computing model, and monoid calculus as its basis of data model. In this paper, we introduce its data model, parallel query optimization, transaction processing system and parallel access method in detail.
文摘An extent join to compute path expressions containing parent-children andancestor-descendent operations and two path expression optimization rules, path-shortening andpath-complementing, are presented in this paper. Path-shortening reduces the number of joins byshortening the path while path-complementing optimizes the path execution by using an equivalentcomplementary path expression to compute the original one. Experimental results show that thealgorithms proposed are more efficient than traditional algorithms.
文摘Fundamentally, semantic grid database is about bringing globally distributed databases together in order to coordinate resource sharing and problem solving in which information is given well-defined meaning, and DartGrid II is the implemented database gird system whose goal is to provide a semantic solution for integrating database resources on the Web. Although many algorithms have been proposed for optimizing query-processing in order to minimize costs and/or response time, associated with obtaining the answer to query in a distributed database system, database grid query optimization problem is fundamentally different from traditional distributed query optimization. These differences are shown to be the consequences of autonomy and heterogeneity of database nodes in database grid. Therefore, more challenges have arisen for query optimization in database grid than traditional distributed database. Following this observation, the design of a query optimizer in DartGrid II is presented, and a heuristic, dynamic and parallel query optimization approach to processing query in database grid is proposed. A set of semantic tools supporting relational database integration and semantic-based information browsing has also been implemented to realize the above vision.
文摘As the popularity of XML (extensible Markup Language) keeps growing rapidly,the management of XML compliant structured-document databases has become a very interesting andcompelling research area. Query optimization for XML structured-documents stands out as one of themost challenging research issues in this area because of the much enlarged optimization (search)space, which is a consequence of the intrinsic complexity of the underlying data model of XML data.We therefore propose to apply deterministic transformations on query expressions to mostaggressively prune the search space and fast achieve a sufficiently improved alternative (if not theoptimal) for each incoming query expression. This idea is not just exciting but practicallyattainable. This paper first provides an overview of our optimization strategy, and then focuses onthe key implementation issues of our rule-based transformation system for XML query optimization ina database environment. The performance results we obtained from experimentation show that ourapproach is a valid and effective one.
基金partially supported by the National Basic Research 973 Program of China under Grant No. 2005CB321807the National High Technology Rresearch and Development 863 Program of China under Grant Nos. 2006AA01A106 and 2006AA04Z158.
文摘Our study introduces a novel distributed query plan refinement phase in an enhanced architecture of distributed query processing engine (DQPE). Query plan refinement generates potentially efficient distributed query plan by reusable aggregate query shipping (RAQS) approach. The approach improves response time at the cost of pre-processing time. If the overheads could not be compensated by query results reusage, RAQS is no more favorable. Therefore a globM cost estimation model is employed to get proper operators: RR_Agg, R_Agg, or R_Scan. For the purpose of reusing results of queries with aggregate function in distributed query processing, a multi-level hybrid view caching (HVC) scheme is introduced. The scheme retains the advantages of partial match and aggregate query results caching. By our solution, evaluations with distributed TPC-H queries show significant improvement on average response time.
文摘Semantic Web has emerged to make web content machine-readable,and with the rapid increase in the number of web pages,its importance has increased.Resource description framework(RDF)is a special data graph format where Semantic Web data are stored and it can be queried by SPARQL query language.The challenge is to find the optimal query order that results in the shortest period of time.In this paper,the discrete Artificial Bee Colony(dABCSPARQL)algorithm is proposed,based on a novel heuristic approach,namely reordering SPARQL queries.The processing time of queries with different shapes and sizes is minimized using the dABCSPARQL algorithm.The performance of the proposed method is evaluated on chain,star,cyclic,and chain-star queries of different sizes from the Lehigh University Benchmark(LUBM)dataset.The results obtained by the proposed method are compared with those of ARQ(a SPARQL processor for Jena)query engine,the Ant System,the Elitist Ant System,and MAX-MIN Ant System algorithms.The experiments demonstrate that the proposed method significantly reduces the processing time,and in most queries,the reduction rate is higher compared with other optimization methods.
文摘A systematic, efficient compilation method for query evaluation of DeductiveDatabases (DeDB) is proposed in this paper. In order to eliminate redundancyand to minimize the potentially relevant facts, which are two key issues to theefficiency of a DeDB, the compilation process is decomposed into two phases.The first is the pre-compilation phase, which is responsible for the minimiza-tion of the potentially relevant facts. The second, which we refer to as thegeneral compilation phase, is responsible for the elimination of redundancy.The rule/goal graph devised by J. D. Ullman is appropriately extended andused as a uniform formalism. Two general algorithms corresponding to the twophases respectively are described intuitively and formally
基金This paper was supported by the National Nature Science Foundation of China(Grant Nos.61572537,U1501252).
文摘Optimal location query in road networks is a basic operation in the location intelligence applications.Given a set of clients and servers on a road network,the purpose of optimal location query is to obtain a location for a new server,so that a certain objective function calculated based on the locations of clients and servers is optimal.Existing works assume no labels for servers and that a client only visits the nearest server.These assumptions are not realistic and it renders the existing work not useful in many cases.In this paper,we relax these assumptions and consider the k nearest neighbours(KNN)of clients.We introduce the problem of KNN-based optimal location query(KOLQ)which considers the k nearest servers of clients and labeled servers.We also introduce a variant problem called relocation KOLQ(RKOLQ)which aims at relocating an existing server to an optimal location.Two main analysis algorithms are proposed for these problems.Extensive experiments on the real road networks illustrate the efficiency of our proposed solutions.
文摘Static optimization of logical queries is, in substance, to move selections down as far as possible in evaluating logical queries. This paper extends Ullman's RGG (Rule/Goal Graph) and introduces P- graph, with which a wide range of recursive logical queries can be statically optimized top-down and evaluated bottom-up, some of which are usually optimized by dynamic approaches. The paper also shows that for some logical queries the complexity of pushing selections down and computing bottom-up is related to the complexity of base relation in the queries.
基金This work is supported by the NSF of USA under Grant No.IIS-0308001the National Natural Science Foundation of China under Grant No.60303008the National Grand Fundamental Research 973 Program of China under Grant No.2005CB321905.
文摘We present the first efficient sound and complete algorithm (i.e., AOMSSQ) for optimizing multiple subspace skyline queries simultaneously in this paper. We first identify three performance problems of the na/ve approach (i.e., SUBSKY) which can be used in processing arbitrary single-subspace skyline query. Then we propose a cell-dominance computation algorithm (i.e., CDCA) to efficiently overcome the drawbacks of SUBSKY. Specially, a novel pruning technique is used in CDCA to dramatically decrease the query time. Finally, based on the CDCA algorithm and the share mechanism between subspaces, we present and discuss the AOMSSQ algorithm and prove it sound and complete. We also present detailed theoretical analyses and extensive experiments that demonstrate our algorithms are both efficient and effective.
基金This work was partially supported by the National Key Technology Research and Development Program of the Ministry of Science and Technology of China under Grant No. 2015BAH10F01, the National Natural Science Foundation of China under Grant Nos. U1509216, 61472099, and 61133002, the Scientific Research Foundation for the Returned Overseas Chinese Scholars of Heilongjiang Province of China under Grant No. LC2016026, and the Ministry of Education (MOE)-Microsoft Key Laboratory of Natural Language Processing and Speech, Harbin Institute of Technology.
文摘Data quality is important in many data-driven applications, such as decision making, data analysis, and data mining. Recent studies focus on data cleaning techniques by deleting or repairing the dirty data, which may cause information loss and bring new inconsistencies. To avoid these problems, we propose EntityManager, a general system to manage dirty data without data cleaning. This system takes real-world entity as the basic storage unit and retrieves query results according to the quality requirement of users. The system is able to handle all kinds of inconsistencies recognized by entity resolution. We elaborate the EntityManager system, covering its architecture, data model, and query processing techniques. To process queries efficiently, our system adopts novel indices, similarity operator and query optimization techniques. Finally, we verify the efficiency and effectiveness of this system and present future research challenges.
文摘In recent years,Apache Spark has become the de facto standard for big data processing.SparkSQL is a module offering support for relational analysis on Spark with Structured Query Language(SQL).SparkSQL provides convenient data processing interfaces.Despite its efficient optimizer,SparkSQL still suffers from the inefficiency of Spark resulting from Java virtual machine and the unnecessary data serialization and deserialization.Adopting native languages such as C++could help to avoid such bottlenecks.Benefiting from a bare-metal runtime environment and template usage,systems with C++interfaces usually achieve superior performance.However,the complexity of native languages also increases the required programming and debugging efforts.In this work,we present LotusSQL,an engine to provide SQL support for dataset abstraction on a native backend Lotus.We employ a convenient SQL processing framework to deal with frontend jobs.Advanced query optimization technologies are added to improve the quality of execution plans.Above the storage design and user interface of the compute engine,LotusSQL implements a set of structured dataset operations with high efficiency and integrates them with the frontend.Evaluation results show that LotusSQL achieves a speedup of up to 9 in certain queries and outperforms Spark SQL in a standard query benchmark by more than 2 on average.
基金This work was supported by the National Natural Science Foundation of China[grant numbers 41222009,41271405].
文摘Spatial selectivity estimation is crucial to choose the cheapest execution plan for a given query in a query optimizer.This article proposes an accurate spatial selectivity estimation method based on the cumulative density(CD)histograms,which can deal with any arbitrary spatial query window.In this method,the selectivity can be estimated in original logic of the CD histogram,after the four corner values of a query window have been accurately interpolated on the continuous surface of the elevation histogram.For the interpolation of any corner points,we first identify the cells that can affect the value of point(x,y)in the CD histogram.These cells can be categorized into two classes:ones within the range from(0,0)to(x,y)and the other overlapping the range from(0,0)to(x,y).The values of the former class can be used directly,whereas we revise the values of any cells falling in the latter class by the number of vertices in the corresponding cell and the area ratio covered by the range from(0,0)to(x,y).This revision makes the estimation method more accurate.The CD histograms and estimation method have been implemented in INGRES.Experiment results show that the method can accurately estimate the selectivity of arbitrary query windows and can help the optimizer choose a cheaper query plan.