摘要
为了使列存储OLAP(on-line analytical processing)操作中I/O和CPU开销较大的扫描、连接、聚集操作实现有效的共享和复用,提出了一个多查询优化技术。根据列存储以及OLAP操作的特点,提出了一系列转换规则,为OLAP查询请求产生的一组相关查询语句生成一个单一全局查询计划。为了达到共享复用的目的,在全局计划中引入新的过滤结点、分组结点、合并结点和聚集结点。同时,借用MuGA(multiply group by algo-rithm)算法,通过分组结点、合并结点、连接结点实现维表及事实表元组的分组序号标记,从而实现列扫描、列连接的共享。并为聚集结点提出了一个多阶段聚集算法,结合最终生成的事实表复合分组序号,实现聚集操作的复用。在SSB(star schema benchmark)数据集上设计实验,证明了该多查询优化策略的有效性。
This paper provides a multi-query optimization strategy to achieve the share and reuse of column scan, column join and aggregation operations in column store OLAP (on-line analytical processing) systems which can easily lead to large I/O and CPU overhead. According to the features of column-stores and OLAP applications, the paper proposes a series of transformation rules to generate a single global query plan for a set of related queries mapped from a certain OLAP require. Aiming at the share and reuse of operations, the paper also introduces four new defined nodes: the filter node, the group by node, the merge node and the aggregation node. At the same time, it makes an improvement to the MuGA (multiply group by algorithm), and uses the filter node, merge node and join node to mark group number for each tuple in dimension and fact tables, to achieve the share of column scan and col- umn join operations. For the aggregation node, the paper proposes a multi-phase aggregation algorithm combined with compound group number of the fact table to implement effective aggregation reuse. The experimental results on the benchmark data sets SSB (star schema benchmark) also verify the effectiveness of the multi-query optimization strategy.
出处
《计算机科学与探索》
CSCD
2012年第9期852-864,共13页
Journal of Frontiers of Computer Science and Technology
基金
国家自然科学基金Nos.61070031,61070032,61103046
国家科技重大专项“核高基”项目No.2010ZX01042-001-003-004~~
关键词
列存储
联机分析处理(OLAP)
多查询优化
全局计划
操作复用
column store
on-line analytical processing (OLAP)
multi-query optimization
global query plan
operation reuse