期刊文献+

一种高效准确的基于查询结果的基数估计策略

A Strategy of Efficient and Accurate Cardinality Estimation Based on Query Result
下载PDF
导出
摘要 基数估计是查询优化的重要组成部分,其高效性、准确性直接影响查询优化效果。传统基数估计策略基于原表或原表样本进行统计信息收集,然后利用收集好的统计信息推导出基数。该策略在数据量大时,统计信息收集效率低;统计信息存在延迟,并且基数通过推导得到,准确度无法保证;一些策略通过子查询的反馈信息得到基数,但结果没有保存,基数获取效率低。为解决这些问题,提出了一种高效准确的基于查询结果的基数估计策略(cardinality estimation based on query result,CEQR),特点是统计信息来源为查询执行结果,不需要进行推导,保证基数的准确度,并且收集效率与原表数据量无关;建立一种基数表,保存基本表和中间结果在某种谓词下的统计信息,为后续查询提供服务,并建立基数维护规则,合理管理基数表;建立资源感知策略,将基数项映射到缓存,加快统计信息获取效率。给出了基于CEQR策略的适应性以及误差分析,并通过实验得出CEQR策略在效率上优于传统基数估计策略。 Cardinality estimation is an important component of query optimization. Its accuracy and efficiency di- rectly decide effect of query optimization. Traditional cardinality estimation strategy is based on original table or sample to collect statistics, then inferring cardinality by collected statistics. It will be low-efficiency when handling big data; Statistics exist update latency and are gotten by inferring, which can not guarantee correctness; Some strategies can get the actual cardinality by executing some subqueries, but they do not keep the result, leading to low efficiency of fetching statistics. Against these problems, this paper proposes a novel cardinality estimation strate- gy, called cardinality estimation based on query result(CEQR). For keeping correctness of cardinality, CEQR di- rectly gets statistics from query results, which is not related with data size; we build a cardinality table to store the statistics of basic tables and middle results under specific predicates. Cardinality table can provide cardinality serv- ices for subsequent queries, and we build a suit of rules to maintain cardinality table; To improve the efficiency of fetching statistics, we introduce the source aware strategy, which hashes cardinality item to appropriate cache. This paper gives the adaptability and deviation analytic of CEQR, and proves that CEQR is more efficient than traditional cardinality estimation strategy by experiments.
作者 高锦涛 李战怀 刘文洁 Gao Jintao;Li Zhanhuai;Liu Wenjie(School of Computer Science,Northwestern Polytechnical University,Xi'an 710072,China)
出处 《西北工业大学学报》 EI CAS CSCD 北大核心 2018年第4期768-777,共10页 Journal of Northwestern Polytechnical University
基金 国家高技术研究发展计划(863)(2015AA015307) 陕西省基础研究计划(2017JM6104) 国家自然科学基金(61732014)资助
关键词 大数据 基数估计 查询优化 查询结果 高效 准确 big data cardinality estimation query optimization query result efficient accurate
  • 相关文献

参考文献1

二级参考文献10

  • 1天猫微博[EB/OL].http://weibo.com/1768198384/AiigJrzYT? mod=weibotime.
  • 2支付宝微博[EB/OL].http://weibo.com/1627897870/AiiuiseVH? mod=weibotime.
  • 3OceanBase开源[EB/OL].http://alibaba.github.io/oceanbase/.
  • 4天猫微博.http://weibo.com/1768198384/Aie2CyONt? mod=weibotime#_rnd1404271771131.
  • 5阿里巴巴招股书[EB/OL].2014-06-17.http://www.sec.gov/Archives/edgar/data/1577552/000119312514236860/d709111df1a.htm.
  • 6Angry Birds Racks Up 8 Million Downloads in One Day[EB/OL].http://www.forbes.com/sites/davidthier/2013/01/04/angry-birds-racks-up-8-million-downloads-in-one-day/.
  • 7LAMPORT,L.The part-time parliament[J].ACM TOCS,1998,16(2):133-169.
  • 8CHANG F,DEAN J,GHEMAWAT S,et al.Bigtable..A distributed storage system for structured data[J].OSDI,2006:205-218.
  • 9GHEMAWAT S,GOBIOFF H,LEUNG,et al.The Google file system[R].ACM SOSP,2003:29-43.
  • 10CORBETT J C,DEAN J,EPSTEIN M,et al.Spanner:Google's globally-distributed database[C].OSDI,2012:251-264.

共引文献21

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部