摘要
资源描述框架图查询中,准确估计查询结果的大小是查询优化器中的关键步骤.已有方法忽略了该图自身的不确定性以及子查询间的关联关系,无法有效估计结果.针对该问题,本文提出一种基于贝叶斯模型的基数估计方法.该方法引入贝叶斯网络模型,挖掘出子查询内的属性依赖.同时,在这些属性依赖的基础上提出子网拼接方法,计算出子查询间的影响因子.最后,利用以上信息准确估计出任意查询结果集的基数.实验表明:与已有方法相比,本文方法的准确性提高15%以上,性能没有大幅度下降.
In RDF(Resource Description Framework)graph query,accurately estimating the size of the query result is a cru-cial step to the query optimizer.The previous work,which ignores both the uncertainty of RDF graph itself and the correlations be-tween subqueries,is difficult to obtain accurate estimations.To solve this problem,this paper proposes an estimation method based on Bayesian probability model.Our method introduces Bayesian network model for subqueries to dig out the dependencies between properties in subqueries.At the meanwhile,based on these dependencies we propose a connection approach of subnets to compute the impact factors between subqueries.Finally,we exploit the above information to accurately estimate the cardinality of the result about an arbitrary query.The experiments indicate that the accuracy of our estimation results is improved by over 15% and that the query run-time is not increased significantly in comparison with the previous art.
出处
《电子学报》
EI
CAS
CSCD
北大核心
2015年第9期1745-1749,共5页
Acta Electronica Sinica