针对传统的数据库管理系统无法很好地学习谓词之间的交互以及无法准确地估计复杂查询的基数问题,提出了一种树形结构的长短期记忆神经网络(Tree Long Short Term Memory, TreeLSTM)模型建模查询,并使用该模型对新的查询基数进行估计.所...针对传统的数据库管理系统无法很好地学习谓词之间的交互以及无法准确地估计复杂查询的基数问题,提出了一种树形结构的长短期记忆神经网络(Tree Long Short Term Memory, TreeLSTM)模型建模查询,并使用该模型对新的查询基数进行估计.所提出的模型考虑了查询语句中包含的合取和析取运算,根据谓词之间的操作符类型将子表达式构建为树形结构,根据组合子表达式向量来表示连续向量空间中的任意逻辑表达式.TreeLSTM模型通过捕捉查询谓词之间的顺序依赖关系从而提升基数估计的性能和准确度,将TreeLSTM与基于直方图方法、基于学习的MSCN和TreeRNN方法进行了比较.实验结果表明:TreeLSTM的估算误差比直方图、MSCN、TreeRNN方法的误差分别降低了60.41%,33.33%和11.57%,该方法显著提高了基数估计器的性能.展开更多
Spatial selectivity estimation is crucial to choose the cheapest execution plan for a given query in a query optimizer.This article proposes an accurate spatial selectivity estimation method based on the cumulative de...Spatial selectivity estimation is crucial to choose the cheapest execution plan for a given query in a query optimizer.This article proposes an accurate spatial selectivity estimation method based on the cumulative density(CD)histograms,which can deal with any arbitrary spatial query window.In this method,the selectivity can be estimated in original logic of the CD histogram,after the four corner values of a query window have been accurately interpolated on the continuous surface of the elevation histogram.For the interpolation of any corner points,we first identify the cells that can affect the value of point(x,y)in the CD histogram.These cells can be categorized into two classes:ones within the range from(0,0)to(x,y)and the other overlapping the range from(0,0)to(x,y).The values of the former class can be used directly,whereas we revise the values of any cells falling in the latter class by the number of vertices in the corresponding cell and the area ratio covered by the range from(0,0)to(x,y).This revision makes the estimation method more accurate.The CD histograms and estimation method have been implemented in INGRES.Experiment results show that the method can accurately estimate the selectivity of arbitrary query windows and can help the optimizer choose a cheaper query plan.展开更多
文摘针对传统的数据库管理系统无法很好地学习谓词之间的交互以及无法准确地估计复杂查询的基数问题,提出了一种树形结构的长短期记忆神经网络(Tree Long Short Term Memory, TreeLSTM)模型建模查询,并使用该模型对新的查询基数进行估计.所提出的模型考虑了查询语句中包含的合取和析取运算,根据谓词之间的操作符类型将子表达式构建为树形结构,根据组合子表达式向量来表示连续向量空间中的任意逻辑表达式.TreeLSTM模型通过捕捉查询谓词之间的顺序依赖关系从而提升基数估计的性能和准确度,将TreeLSTM与基于直方图方法、基于学习的MSCN和TreeRNN方法进行了比较.实验结果表明:TreeLSTM的估算误差比直方图、MSCN、TreeRNN方法的误差分别降低了60.41%,33.33%和11.57%,该方法显著提高了基数估计器的性能.
基金This work was supported by the National Natural Science Foundation of China[grant numbers 41222009,41271405].
文摘Spatial selectivity estimation is crucial to choose the cheapest execution plan for a given query in a query optimizer.This article proposes an accurate spatial selectivity estimation method based on the cumulative density(CD)histograms,which can deal with any arbitrary spatial query window.In this method,the selectivity can be estimated in original logic of the CD histogram,after the four corner values of a query window have been accurately interpolated on the continuous surface of the elevation histogram.For the interpolation of any corner points,we first identify the cells that can affect the value of point(x,y)in the CD histogram.These cells can be categorized into two classes:ones within the range from(0,0)to(x,y)and the other overlapping the range from(0,0)to(x,y).The values of the former class can be used directly,whereas we revise the values of any cells falling in the latter class by the number of vertices in the corresponding cell and the area ratio covered by the range from(0,0)to(x,y).This revision makes the estimation method more accurate.The CD histograms and estimation method have been implemented in INGRES.Experiment results show that the method can accurately estimate the selectivity of arbitrary query windows and can help the optimizer choose a cheaper query plan.