摘要
星型连接是OLAP中重要的操作,事实表与维表基于星型连接执行多维分析处理.星型连接的性能取决于连接性能.当前研究主要集中在如何在不同的处理器平台上优化哈希连接性能,然而如何获得最优的哈希连接参数或实现是一个复杂的问题.哈希连接不依赖于模式的语义信息,然而却可以在事实表与维表之间通过维映射特征进一步优化连接性能.该文提出了一种新颖的面向OLAP负载的向量索引以提高事实表与维表之间的连接性能.从模式的角度来看,维表可以映射为向量索引,每一个事实表记录可以直接映射到向量索引上的相应位置,无须执行基于值匹配的哈希连接操作.从实现技术的角度看,向量索引是一种位图索引、字典表压缩、主外键参照完整性约束和连接索引相结合的技术.系统化的设计使向量索引可以扮演多种角色:(1)向量索引与位图索引类似起到过滤作用;(2)向量索引相对于只存储0或1的位图索引使用更多的位来表示更多的信息;(3)映射或创建自动增长的主键作为向量索引地址并且更新相应的外键,将主外键参照完整性约束转换为向量参照约束;(4)外键连接操作简化为通过外键值引用向量单元.基于向量索引,OLAP中代价大的星形连接可以抽象为向量索引计算,OLAP查询可以简化为基于向量索引的单表扫描处理.向量索引简化的设计不仅可以提升性能,而且降低了在GPU平台实现的复杂度.本文首先讨论了向量索引机制和如何在数据库中应用向量索引;然后设计向量索引更新机制,以保证在更新时向量参照约束;最后提出基于向量索引的OLAP框架来提高内存数据库OLAP性能.基于向量索引的星型连接可以用作GPU上的OLAP加速器,使CPU可以将计算密集型负载转移到高性能GPU平台来加速OLAP处理.实验结果表明向量索引更新代价较低,而向量引用性能收益较大.更重要的是,向量索引支持OLAP中的星形连接操作在内存数据库引擎之外进行加速,降低了内存数据库的CPU负载,或者将星形连接负载通过硬件级加速器,如GPU进行加速.基于向量索引的星型连接可以显著提升CPU和GPU平台上的星型连接性能,相对于内存数据库Vector,在SSB Q4.1查询可以获得最大3倍的性能提升,平均性能提升了1.2倍.
Star-join is an important operator in OLAP,in which the big fact table needs to join with multiple dimension tables to perform a multidimensional analytical processing.The star-join performance is dominated by join performance.State-of-the-art researches majorly focused on how to optimize hash join performance for different hardware platforms,while achieving the maximal hash join parameters or implementation is a complex issue.Hash join doesn’t rely on semantic information of schema,while we can further optimize join between fact table and dimension table with dimension mapping feature.This paper introduces a novel vector index for OLAP workloads to accelerate join performance between fact table and dimension tables.From schema perspective,dimension table can be mapped to vector index,each fact tuple can be directly mapped to corresponding positions of vector indexes instead of key matching based hash join.From implementation perspective,vector index is designed as combination of bitmap index,dictionary compression,PK-FK referencing constraint,and join index,the systematic design enables vector index playing multiple roles in query processing:(1)vector index acts as filter like bitmap index;(2)vector index has more bits to present more information than only 0 or 1 of bitmap;(3)mapping or creating incremental PK as vector address and updating corresponding FK column to enable PK-FK referencing constraint as vector referencing constraint;(4)the foreign key join is simplified as referencing vector cell with FK value.With vector index,the costly star-join of OLAP can be extracted as vector index computing,and the OLAP query can be simplified as vector index oriented single table processing.The simplified design of vector index not only improves performance but also reduces the complexity of implementation on GPU platform.We first discuss the vector index mechanism and how to implement vector index inside database,and then design the update mechanism of vector index to guarantee vector referencing constraint during updates,and finally propose a vector index oriented OLAP framework to accelerate OLAP workloads of main-memory databases.The vector index based star-join is employed as an OLAP accelerator on GPU,so that CPU can offload the computing intensive workload of OLAP to high performance GPU platform to accelerate OLAP.The experimental results show that the maintenance overhead of vector index during updates is very low,while the performance gain is huge.Moreover,the vector index enables star-join of OLAP to be accelerated out of in-memory database engine,which can offload the CPU workload of in-memory database or offload the workload to hardware accelerator such as GPU.The vector index based star-join can remarkably improve star-join performance for both CPU and GPU platforms,comparing with the leading in-memory database Vector,the maximal performance gain is achieved by SSB Q4.1 as 3 X,the average performance gain achieves to 1.2 X.
作者
张延松
张宇
王珊
ZHANG Yan-Song;ZHANG Yu;WANG Shan(Key Laboratory of Data Engineering and Knowledge Engineering (Renmin University), Ministry of Education,Beijing 100872;School of Information, Renmin University of China, Beijing 100872;National Survey Research Center at Renmin University of China, Beijing 100872;National Satellite Meteorological Centre, Beijing 100081)
出处
《计算机学报》
EI
CSCD
北大核心
2019年第8期1686-1703,共18页
Chinese Journal of Computers
基金
国家自然科学基金项目(61772533,61732014)
北京市自然科学基金资助项目(4192066)资助~~