Replication is an approach often used to speed up the execution of queries submitted to a large dataset.A compile-time/run-time approach is presented for minimizing the response time of 2-dimensional range when a dist...Replication is an approach often used to speed up the execution of queries submitted to a large dataset.A compile-time/run-time approach is presented for minimizing the response time of 2-dimensional range when a distributed replica of a dataset exists.The aim is to partition the query payload(and its range) into subsets and distribute those to the replica nodes in a way that minimizes a client's response time.However,since query size and distribution characteristics of data(data dense/sparse regions) in varying ranges are not known a priori,performing efficient load balancing and parallel processing over the unpredictable workload is difficult.A technique based on the creation and manipulation of dynamic spatial indexes for query payload estimation in distributed queries was proposed.The effectiveness of this technique was demonstrated on queries for analysis of archived earthquake-generated seismic data records.展开更多
The query optimizer uses cost-based optimization to create an execution plan with the least cost,which also consumes the least amount of resources.The challenge of query optimization for relational database systems is...The query optimizer uses cost-based optimization to create an execution plan with the least cost,which also consumes the least amount of resources.The challenge of query optimization for relational database systems is a combinatorial optimization problem,which renders exhaustive search impossible as query sizes rise.Increases in CPU performance have surpassed main memory,and disk access speeds in recent decades,allowing data compression to be used—strategies for improving database performance systems.For performance enhancement,compression and query optimization are the two most factors.Compression reduces the volume of data,whereas query optimization minimizes execution time.Compressing the database reduces memory requirement,data takes less time to load into memory,fewer buffer missing occur,and the size of intermediate results is more diminutive.This paper performed query optimization on the graph database in a cloud dew environment by considering,which requires less time to execute a query.The factors compression and query optimization improve the performance of the databases.This research compares the performance of MySQL and Neo4j databases in terms of memory usage and execution time running on cloud dew servers.展开更多
Dynamic programming(DP) is an effective query optimization approach to select an appropriate join order for relational database management system(RDBMS) in multi-table joins. This method was extended and made availabl...Dynamic programming(DP) is an effective query optimization approach to select an appropriate join order for relational database management system(RDBMS) in multi-table joins. This method was extended and made available in distributed DBMS(D-DBMS). The structure of this optimal solution was firstly characterized according to the distributing status of tables and data, and then the recurrence relations between a problem and its sub-problems were recursively defined. DP in D-DBMS has the same time-complexity with that in centralized DBMS, while it has the capability to solve a much more sophisticated optimal problem of multi-table join in D-DBMS. The effectiveness of this optimal strategy has been proved by experiments.展开更多
Semantic query optimization (SQO) is comparatively a recent approach for the transformation of given query into equivalent alternative query using matching rules in order to select an optimal query based on the costs ...Semantic query optimization (SQO) is comparatively a recent approach for the transformation of given query into equivalent alternative query using matching rules in order to select an optimal query based on the costs of executing alternative queries. The key aspect of the algorithm proposed here is that previous proposed SQO techniques can be considered equally in the uniform cost model, with which optimization opportunities will not be missed. At the same time, the authors used the implication closure to guarantee that any matched rule will not be lost. The authors implemented their algorithm for the optimization of decomposed sub-query in local database in Multi-Database Integrator (MDBI), which is a multidatabase project. The experimental results verify that this algorithm is effective in the process of SQO.展开更多
The author investigates the query optimization problem for parallel relational databases. A multi - weighted tree based query optimization method is proposed. The method consists of a multi - weighted tree based paral...The author investigates the query optimization problem for parallel relational databases. A multi - weighted tree based query optimization method is proposed. The method consists of a multi - weighted tree based parallel query plan model, a cost model for parallel qury plans and a query optimizer. The parallel query plan model is the first one to model all basic relational operations, all three types of parallelism of query execution, processor and memory allocation to operations, memory allocation to the buffers between operations in pipelines and data redistribution among processors. The cost model takes the waiting time of the operations in pipelining execution into consideration and is computable in a bottom - up fashion. The query optimizer addresses the query optimization problem in the context of Select - Project - Join queries that are widely used in commercial DBMSs. Several heuristics determining the processor allocation to operations are derived and used in the query optimizer. The query optimizer is aware of memory resources in order to generate good - quality plans. It includes the heuristics for determining the memory allocation to operations and buffers between operations in pipelines so that the memory resourse is fully exploit. In addition, multiple algorithms for implementing join operations are consided in the query optimizer. The query optimizer can make an optimal choice of join algorithm for each join operation in a query. The proposed query optimization method has been used in a prototype parallel database management system designed and implemented by the author.展开更多
Recently, attention has been focused on spatial query language which is used to query spatial databases. A design of spatial query language has been presented in this paper by extending the standard relational databas...Recently, attention has been focused on spatial query language which is used to query spatial databases. A design of spatial query language has been presented in this paper by extending the standard relational database query language SQL. It recognizes the significantly different requirements of spatial data handling and overcomes the inherent problems of the application of conventional database query languages. This design is based on an extended spatial data model, including the spatial data types and the spatial operators on them. The processing and optimization of spatial queries have also been discussed in this design. In the end, an implementation of this design is given in a spatial query subsystem.展开更多
The query processing in distributed database management systems(DBMS)faces more challenges,such as more operators,and more factors in cost models and meta-data,than that in a single-node DMBS,in which query optimizati...The query processing in distributed database management systems(DBMS)faces more challenges,such as more operators,and more factors in cost models and meta-data,than that in a single-node DMBS,in which query optimization is already an NP-hard problem.Learned query optimizers(mainly in the single-node DBMS)receive attention due to its capability to capture data distributions and flexible ways to avoid hard-craft rules in refinement and adaptation to new hardware.In this paper,we focus on extensions of learned query optimizers to distributed DBMSs.Specifically,we propose one possible but general architecture of the learned query optimizer in the distributed context and highlight differences from the learned optimizer in the single-node ones.In addition,we discuss the challenges and possible solutions.展开更多
Through the mapping from UMQL ( unified multimedia query language) conditional expressions to UMQA (unified multimedia query algebra) query operations, a translation algorithm from a UMQL query to a UMQA query pla...Through the mapping from UMQL ( unified multimedia query language) conditional expressions to UMQA (unified multimedia query algebra) query operations, a translation algorithm from a UMQL query to a UMQA query plan is put forward, which can generate an equivalent UMQA internal query plan for any UMQL query. Then, to improve the execution costs of UMQA query plans effectively, equivalent UMQA translation formulae and general optimization strategies are studied, and an optimization algorithm for UMQA internal query plans is presented. This algorithm uses equivalent UMQA translation formulae to optimize query plans, and makes the optimized query plans accord with the optimization strategies as much as possible. Finally, the logic implementation methods of UMQA plans, i.e., logic implementation methods of UMQA operators, are discussed to obtain useful target data from a muifirnedia database. All of these algorithms are implemented in a UMQL prototype system. Application results show that these query processing techniques are feasible and applicable.展开更多
The traditional method first classifies the user information and combines the query method to retrieve the interest information, but neglects to calculate the weight of the user interest information, which leads to th...The traditional method first classifies the user information and combines the query method to retrieve the interest information, but neglects to calculate the weight of the user interest information, which leads to the low retrieval accuracy. A retrieval method based on the fuzzy proximity classification technology is proposed. Approximation between the fuzzy sets is used to represent the consistency between the user interest information features, and the consistency calculation formula and the skewness confidence matrix between the user interest information features are given. The fuzzy classification of the user interest information can obtain the most consistent confidence data and eliminate the redundant approximation interference data. The probabilistic model of the information word frequency and the user interest information length calculates the weight of the user interest information, and adjusts the weight formula constantly.展开更多
The rigid structure of the traditional relational database leads to data redundancy,which seriously affects the efficiency of the data query and cannot effectively manage massive data.To solve this problem,we use dist...The rigid structure of the traditional relational database leads to data redundancy,which seriously affects the efficiency of the data query and cannot effectively manage massive data.To solve this problem,we use distributed storage and parallel computing technology to query RDF data.In order to achieve efficient storage and retrieval of large-scale RDF data,we combine the respective advantage of the storage model of the relational database and the distributed query.To overcome the disadvantages of storing and querying RDF data,we design and implement a breadth-first path search algorithm based on the keyword query on a distributed platform.We conduct the LUBM query statements respectively with the selected data sets.In experiments,we compare query response time in different conditions to evaluate the feasibility and correctness of our approaches.The results show that the proposed scheme can reduce the storage cost and improve query efficiency.展开更多
Fundamentally, semantic grid database is about bringing globally distributed databases together in order to coordinate resource sharing and problem solving in which information is given well-defined meaning, and DartG...Fundamentally, semantic grid database is about bringing globally distributed databases together in order to coordinate resource sharing and problem solving in which information is given well-defined meaning, and DartGrid II is the implemented database gird system whose goal is to provide a semantic solution for integrating database resources on the Web. Although many algorithms have been proposed for optimizing query-processing in order to minimize costs and/or response time, associated with obtaining the answer to query in a distributed database system, database grid query optimization problem is fundamentally different from traditional distributed query optimization. These differences are shown to be the consequences of autonomy and heterogeneity of database nodes in database grid. Therefore, more challenges have arisen for query optimization in database grid than traditional distributed database. Following this observation, the design of a query optimizer in DartGrid II is presented, and a heuristic, dynamic and parallel query optimization approach to processing query in database grid is proposed. A set of semantic tools supporting relational database integration and semantic-based information browsing has also been implemented to realize the above vision.展开更多
A systematic, efficient compilation method for query evaluation of DeductiveDatabases (DeDB) is proposed in this paper. In order to eliminate redundancyand to minimize the potentially relevant facts, which are two key...A systematic, efficient compilation method for query evaluation of DeductiveDatabases (DeDB) is proposed in this paper. In order to eliminate redundancyand to minimize the potentially relevant facts, which are two key issues to theefficiency of a DeDB, the compilation process is decomposed into two phases.The first is the pre-compilation phase, which is responsible for the minimiza-tion of the potentially relevant facts. The second, which we refer to as thegeneral compilation phase, is responsible for the elimination of redundancy.The rule/goal graph devised by J. D. Ullman is appropriately extended andused as a uniform formalism. Two general algorithms corresponding to the twophases respectively are described intuitively and formally展开更多
We developed a parallel object relational DBMS named PORLES. It uses BSP model as its parallel computing model, and monoid calculus as its basis of data model. In this paper, we introduce its data model, parallel que...We developed a parallel object relational DBMS named PORLES. It uses BSP model as its parallel computing model, and monoid calculus as its basis of data model. In this paper, we introduce its data model, parallel query optimization, transaction processing system and parallel access method in detail.展开更多
针对传统的数据库管理系统无法很好地学习谓词之间的交互以及无法准确地估计复杂查询的基数问题,提出了一种树形结构的长短期记忆神经网络(Tree Long Short Term Memory, TreeLSTM)模型建模查询,并使用该模型对新的查询基数进行估计.所...针对传统的数据库管理系统无法很好地学习谓词之间的交互以及无法准确地估计复杂查询的基数问题,提出了一种树形结构的长短期记忆神经网络(Tree Long Short Term Memory, TreeLSTM)模型建模查询,并使用该模型对新的查询基数进行估计.所提出的模型考虑了查询语句中包含的合取和析取运算,根据谓词之间的操作符类型将子表达式构建为树形结构,根据组合子表达式向量来表示连续向量空间中的任意逻辑表达式.TreeLSTM模型通过捕捉查询谓词之间的顺序依赖关系从而提升基数估计的性能和准确度,将TreeLSTM与基于直方图方法、基于学习的MSCN和TreeRNN方法进行了比较.实验结果表明:TreeLSTM的估算误差比直方图、MSCN、TreeRNN方法的误差分别降低了60.41%,33.33%和11.57%,该方法显著提高了基数估计器的性能.展开更多
文摘Replication is an approach often used to speed up the execution of queries submitted to a large dataset.A compile-time/run-time approach is presented for minimizing the response time of 2-dimensional range when a distributed replica of a dataset exists.The aim is to partition the query payload(and its range) into subsets and distribute those to the replica nodes in a way that minimizes a client's response time.However,since query size and distribution characteristics of data(data dense/sparse regions) in varying ranges are not known a priori,performing efficient load balancing and parallel processing over the unpredictable workload is difficult.A technique based on the creation and manipulation of dynamic spatial indexes for query payload estimation in distributed queries was proposed.The effectiveness of this technique was demonstrated on queries for analysis of archived earthquake-generated seismic data records.
文摘The query optimizer uses cost-based optimization to create an execution plan with the least cost,which also consumes the least amount of resources.The challenge of query optimization for relational database systems is a combinatorial optimization problem,which renders exhaustive search impossible as query sizes rise.Increases in CPU performance have surpassed main memory,and disk access speeds in recent decades,allowing data compression to be used—strategies for improving database performance systems.For performance enhancement,compression and query optimization are the two most factors.Compression reduces the volume of data,whereas query optimization minimizes execution time.Compressing the database reduces memory requirement,data takes less time to load into memory,fewer buffer missing occur,and the size of intermediate results is more diminutive.This paper performed query optimization on the graph database in a cloud dew environment by considering,which requires less time to execute a query.The factors compression and query optimization improve the performance of the databases.This research compares the performance of MySQL and Neo4j databases in terms of memory usage and execution time running on cloud dew servers.
文摘Dynamic programming(DP) is an effective query optimization approach to select an appropriate join order for relational database management system(RDBMS) in multi-table joins. This method was extended and made available in distributed DBMS(D-DBMS). The structure of this optimal solution was firstly characterized according to the distributing status of tables and data, and then the recurrence relations between a problem and its sub-problems were recursively defined. DP in D-DBMS has the same time-complexity with that in centralized DBMS, while it has the capability to solve a much more sophisticated optimal problem of multi-table join in D-DBMS. The effectiveness of this optimal strategy has been proved by experiments.
文摘Semantic query optimization (SQO) is comparatively a recent approach for the transformation of given query into equivalent alternative query using matching rules in order to select an optimal query based on the costs of executing alternative queries. The key aspect of the algorithm proposed here is that previous proposed SQO techniques can be considered equally in the uniform cost model, with which optimization opportunities will not be missed. At the same time, the authors used the implication closure to guarantee that any matched rule will not be lost. The authors implemented their algorithm for the optimization of decomposed sub-query in local database in Multi-Database Integrator (MDBI), which is a multidatabase project. The experimental results verify that this algorithm is effective in the process of SQO.
基金Supported by the National Natural Science Foundation of China National (9846-004) '863' High -Technique Program of China (8
文摘The author investigates the query optimization problem for parallel relational databases. A multi - weighted tree based query optimization method is proposed. The method consists of a multi - weighted tree based parallel query plan model, a cost model for parallel qury plans and a query optimizer. The parallel query plan model is the first one to model all basic relational operations, all three types of parallelism of query execution, processor and memory allocation to operations, memory allocation to the buffers between operations in pipelines and data redistribution among processors. The cost model takes the waiting time of the operations in pipelining execution into consideration and is computable in a bottom - up fashion. The query optimizer addresses the query optimization problem in the context of Select - Project - Join queries that are widely used in commercial DBMSs. Several heuristics determining the processor allocation to operations are derived and used in the query optimizer. The query optimizer is aware of memory resources in order to generate good - quality plans. It includes the heuristics for determining the memory allocation to operations and buffers between operations in pipelines so that the memory resourse is fully exploit. In addition, multiple algorithms for implementing join operations are consided in the query optimizer. The query optimizer can make an optimal choice of join algorithm for each join operation in a query. The proposed query optimization method has been used in a prototype parallel database management system designed and implemented by the author.
基金This work is supported by the National High Technology Research and Development Program ofChina(2 0 0 2 AA135 2 30 ) and the Major Project of National Natural Science Foundation of Beijing(4 0 110 0 2 ) .
文摘Recently, attention has been focused on spatial query language which is used to query spatial databases. A design of spatial query language has been presented in this paper by extending the standard relational database query language SQL. It recognizes the significantly different requirements of spatial data handling and overcomes the inherent problems of the application of conventional database query languages. This design is based on an extended spatial data model, including the spatial data types and the spatial operators on them. The processing and optimization of spatial queries have also been discussed in this design. In the end, an implementation of this design is given in a spatial query subsystem.
基金partially supported by NSFC under Grant Nos.61832001 and 62272008ZTE Industry-University-Institute Fund Project。
文摘The query processing in distributed database management systems(DBMS)faces more challenges,such as more operators,and more factors in cost models and meta-data,than that in a single-node DMBS,in which query optimization is already an NP-hard problem.Learned query optimizers(mainly in the single-node DBMS)receive attention due to its capability to capture data distributions and flexible ways to avoid hard-craft rules in refinement and adaptation to new hardware.In this paper,we focus on extensions of learned query optimizers to distributed DBMSs.Specifically,we propose one possible but general architecture of the learned query optimizer in the distributed context and highlight differences from the learned optimizer in the single-node ones.In addition,we discuss the challenges and possible solutions.
基金The National High Technology Research and Development Program of China(863 Program) (No.2006AA01Z430)
文摘Through the mapping from UMQL ( unified multimedia query language) conditional expressions to UMQA (unified multimedia query algebra) query operations, a translation algorithm from a UMQL query to a UMQA query plan is put forward, which can generate an equivalent UMQA internal query plan for any UMQL query. Then, to improve the execution costs of UMQA query plans effectively, equivalent UMQA translation formulae and general optimization strategies are studied, and an optimization algorithm for UMQA internal query plans is presented. This algorithm uses equivalent UMQA translation formulae to optimize query plans, and makes the optimized query plans accord with the optimization strategies as much as possible. Finally, the logic implementation methods of UMQA plans, i.e., logic implementation methods of UMQA operators, are discussed to obtain useful target data from a muifirnedia database. All of these algorithms are implemented in a UMQL prototype system. Application results show that these query processing techniques are feasible and applicable.
文摘The traditional method first classifies the user information and combines the query method to retrieve the interest information, but neglects to calculate the weight of the user interest information, which leads to the low retrieval accuracy. A retrieval method based on the fuzzy proximity classification technology is proposed. Approximation between the fuzzy sets is used to represent the consistency between the user interest information features, and the consistency calculation formula and the skewness confidence matrix between the user interest information features are given. The fuzzy classification of the user interest information can obtain the most consistent confidence data and eliminate the redundant approximation interference data. The probabilistic model of the information word frequency and the user interest information length calculates the weight of the user interest information, and adjusts the weight formula constantly.
基金This work is supported in part by National Natural Science Foundation of China(61728204)Innovation Funding(NJ20160028,NT2018027,NT2018028,NS2018057)+1 种基金Aeronautical Science Foundation of China(2016551500)State Key Laboratory for smart grid protection and operation control Foundation,Association of Chinese Graduate Education(ACGE).
文摘The rigid structure of the traditional relational database leads to data redundancy,which seriously affects the efficiency of the data query and cannot effectively manage massive data.To solve this problem,we use distributed storage and parallel computing technology to query RDF data.In order to achieve efficient storage and retrieval of large-scale RDF data,we combine the respective advantage of the storage model of the relational database and the distributed query.To overcome the disadvantages of storing and querying RDF data,we design and implement a breadth-first path search algorithm based on the keyword query on a distributed platform.We conduct the LUBM query statements respectively with the selected data sets.In experiments,we compare query response time in different conditions to evaluate the feasibility and correctness of our approaches.The results show that the proposed scheme can reduce the storage cost and improve query efficiency.
基金Acknowledgements: This work is supported by Youth Foundation of Jilin University (No. 419070100102), NSFC of China Major Research Program 60496321, Basic Theory and Core Techniques of Non Canonical Knowledge National Natural Science Foundation of China under Grant No. 60373098 and No. 60173006, the National High-Tech Research and Development Plan of China under Grant No.2003AA118020, the Major Program of Science and Technology Development Plan of Jilin Province under Grant No. 20020303, the Science and Technology Development Plan of Jilin Province under Grant No. 20030523.
文摘Fundamentally, semantic grid database is about bringing globally distributed databases together in order to coordinate resource sharing and problem solving in which information is given well-defined meaning, and DartGrid II is the implemented database gird system whose goal is to provide a semantic solution for integrating database resources on the Web. Although many algorithms have been proposed for optimizing query-processing in order to minimize costs and/or response time, associated with obtaining the answer to query in a distributed database system, database grid query optimization problem is fundamentally different from traditional distributed query optimization. These differences are shown to be the consequences of autonomy and heterogeneity of database nodes in database grid. Therefore, more challenges have arisen for query optimization in database grid than traditional distributed database. Following this observation, the design of a query optimizer in DartGrid II is presented, and a heuristic, dynamic and parallel query optimization approach to processing query in database grid is proposed. A set of semantic tools supporting relational database integration and semantic-based information browsing has also been implemented to realize the above vision.
文摘A systematic, efficient compilation method for query evaluation of DeductiveDatabases (DeDB) is proposed in this paper. In order to eliminate redundancyand to minimize the potentially relevant facts, which are two key issues to theefficiency of a DeDB, the compilation process is decomposed into two phases.The first is the pre-compilation phase, which is responsible for the minimiza-tion of the potentially relevant facts. The second, which we refer to as thegeneral compilation phase, is responsible for the elimination of redundancy.The rule/goal graph devised by J. D. Ullman is appropriately extended andused as a uniform formalism. Two general algorithms corresponding to the twophases respectively are described intuitively and formally
文摘We developed a parallel object relational DBMS named PORLES. It uses BSP model as its parallel computing model, and monoid calculus as its basis of data model. In this paper, we introduce its data model, parallel query optimization, transaction processing system and parallel access method in detail.
文摘针对传统的数据库管理系统无法很好地学习谓词之间的交互以及无法准确地估计复杂查询的基数问题,提出了一种树形结构的长短期记忆神经网络(Tree Long Short Term Memory, TreeLSTM)模型建模查询,并使用该模型对新的查询基数进行估计.所提出的模型考虑了查询语句中包含的合取和析取运算,根据谓词之间的操作符类型将子表达式构建为树形结构,根据组合子表达式向量来表示连续向量空间中的任意逻辑表达式.TreeLSTM模型通过捕捉查询谓词之间的顺序依赖关系从而提升基数估计的性能和准确度,将TreeLSTM与基于直方图方法、基于学习的MSCN和TreeRNN方法进行了比较.实验结果表明:TreeLSTM的估算误差比直方图、MSCN、TreeRNN方法的误差分别降低了60.41%,33.33%和11.57%,该方法显著提高了基数估计器的性能.