期刊文献+

基于查询特征表示学习的联邦复杂查询基数估计

Cardinality Estimation of Federated Complex Queries Based onQuery Feature Representation Learning
下载PDF
导出
摘要 准确的基数估计是实现最佳查询计划的关键因素,现有方法大多基于深度学习来解决基数估计问题。然而,这种基于RDF图模式的方法专注于具有特定拓扑结构的简单查询,适用范围有限,缺乏对现实场景中频繁使用的复杂类查询的支持。为了解决以上问题,提出一种基于查询特征表示学习的联邦复杂查询基数估计模型。该模型主要处理带有FILTER或DISTINCT关键字的复杂查询,使用新提出的FILTER查询特征化方法将SPARQL查询表示为特征向量,通过模型预测查询基数。同时使用模型预测DISITINCT查询中唯一行比率。在LUBM数据集上的实验表明,与最先进的基数估计方法相比,该模型在估计质量上表现优异,平均估计误差中位数可达1.16,并对多连接查询的基数估计表现出潜力和可扩展性。 Accurate cardinality estimation is the key factor to realize the best query plan.Most of the existing methods are based on deep learning to solve the base estimation problem.However,this method based on RDF graph pattern focuses on simple queries with specific topological structure,which is limited in application scope,and lacks support for complex queries frequently used in real scenes.In order to solve the above problems,we propose a federated complex query cardinality estimation model based on query feature representation learning.This model mainly deals with complex queries with FILTER or DISTINCT keywords.The SPARQL query is expressed as a feature vector by using the newly proposed FILTER query characterization method,and the query cardinality is predicted by the model.Also the model is used to predict the ratio of unique rows in DISITINCT queries.Experiments on LUBM data sets show that compared with the most advanced cardinality estimation methods,this model performs better in cardinality estimation,with an average median estimation error of 1.16,and shows potential and scalability for the estimation of multi-join queries.
作者 徐娇 田萍芳 顾进广 徐芳芳 XU Jiao;TIAN Ping-fang;GU Jin-guang;XU Fang-fang(School of Computer Science and Technology,Wuhan University of Science and Technology,Wuhan 430065,China;Hubei Province Key Laboratory of Intelligent Information Processing and Real-time Industrial System,Wuhan 430065,China;Institute of Big Data Science and Engineering Research,Wuhan University of Science and Technology,Wuhan 430065,China;Key Laboratory of Rich Media Digital Publishing Content Organization and Knowledge Service,National Press and Publication Administration,Beijing 100083,China)
出处 《计算机技术与发展》 2024年第2期32-39,共8页 Computer Technology and Development
基金 科技创新2030“新一代人工智能”重大项目(2020AAA0108500) 国家自然科学基金(U1836118) 富媒体数字出版内容组织与知识服务重点实验室开放基金(ZD2021-11/01)。
关键词 联邦系统 查询优化 复杂查询 深度学习 基数估计 federal system query optimization complex query deep learning cardinality estimation
  • 相关文献

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部