期刊文献+

列存储中的OLAP多查询优化方法 被引量:2

Multi-Query Optimization Strategy in Column-Based OLAP System
下载PDF
导出
摘要 为了使列存储OLAP(on-line analytical processing)操作中I/O和CPU开销较大的扫描、连接、聚集操作实现有效的共享和复用,提出了一个多查询优化技术。根据列存储以及OLAP操作的特点,提出了一系列转换规则,为OLAP查询请求产生的一组相关查询语句生成一个单一全局查询计划。为了达到共享复用的目的,在全局计划中引入新的过滤结点、分组结点、合并结点和聚集结点。同时,借用MuGA(multiply group by algo-rithm)算法,通过分组结点、合并结点、连接结点实现维表及事实表元组的分组序号标记,从而实现列扫描、列连接的共享。并为聚集结点提出了一个多阶段聚集算法,结合最终生成的事实表复合分组序号,实现聚集操作的复用。在SSB(star schema benchmark)数据集上设计实验,证明了该多查询优化策略的有效性。 This paper provides a multi-query optimization strategy to achieve the share and reuse of column scan, column join and aggregation operations in column store OLAP (on-line analytical processing) systems which can easily lead to large I/O and CPU overhead. According to the features of column-stores and OLAP applications, the paper proposes a series of transformation rules to generate a single global query plan for a set of related queries mapped from a certain OLAP require. Aiming at the share and reuse of operations, the paper also introduces four new defined nodes: the filter node, the group by node, the merge node and the aggregation node. At the same time, it makes an improvement to the MuGA (multiply group by algorithm), and uses the filter node, merge node and join node to mark group number for each tuple in dimension and fact tables, to achieve the share of column scan and col- umn join operations. For the aggregation node, the paper proposes a multi-phase aggregation algorithm combined with compound group number of the fact table to implement effective aggregation reuse. The experimental results on the benchmark data sets SSB (star schema benchmark) also verify the effectiveness of the multi-query optimization strategy.
出处 《计算机科学与探索》 CSCD 2012年第9期852-864,共13页 Journal of Frontiers of Computer Science and Technology
基金 国家自然科学基金Nos.61070031,61070032,61103046 国家科技重大专项“核高基”项目No.2010ZX01042-001-003-004~~
关键词 列存储 联机分析处理(OLAP) 多查询优化 全局计划 操作复用 column store on-line analytical processing (OLAP) multi-query optimization global query plan operation reuse
  • 相关文献

参考文献18

  • 1Chaudury S, Dayal U. An overview of data warehousing and OLAP technology[J]. ACM SIGMOD Record, 1997, 26(1): 65-74.
  • 2Abadi D J, Boncz P A, Harizopoulos S. Column-oriented database systems[J]. Proceedings of the VLDB Endowment, 2009, 2(2): 1664-1665.
  • 3Abadi D J. Query execution in column-oriented database systems[D]. Boston: Department of Electrical Engineering and Computer Science, MIT, 2008.
  • 4Slgzak D, Wr6blewski J, Eastwood V, et al. Brighthouse: an analytic data warehouse for Adhoc queries[J]. Proceedings of the VLDB Endowment, 2008, 1(2): 1337-1345.
  • 5Boncz P A. Monet: a next-generation DBMS kernel for queryintensive applications[D]. Amsterdam: Universiteit van Amsterdam, 2002.
  • 6Ivanova M G, Kersten M L, Nes N J, et al. An architecture for recycling intermediates in a column-store[C]//Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data (SIGMOD '09), Providence, Rhode Island, USA, 2009. New York, NY, USA: ACM, 2009: 309-320.
  • 7Wu S, Ooi B, Tan K L. Continuous sampling for online aggregation over multiple queries[C]//Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data (SIFMOD '10), Indianapolis, Indiana, USA, 2010. New York, NY, USA: ACM, 2010: 651-662.
  • 8Zhao Yihong, Deshpande P M, Naughton J F, et al. Simultaneous optimization and evaluation of multiple dimension queries[C]//Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data (SIGMOD '98), Seattle, WA, USA, 1998. New York, NY, USA: ACM, 1998: 271-282.
  • 9Candea G, Polyzotis N, Vingralek R. A scalable, predictablejoin operator for highly concurrent data warehouses[J]. Proceedings of the VLDB Endowment, 2009, 2(1): 277-288.
  • 10冯建华,蒋旭东,孟宪虎.基于分组序号的聚集算法[J].软件学报,2003,14(2):222-229. 被引量:6

二级参考文献3

共引文献5

同被引文献16

引证文献2

二级引证文献8

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部