摘要
在列数据库中,连接操作依然是最核心和最耗时的操作,GPU强大的计算能力可为此提供新的优化手段。基于Fermi架构,提出了新的Hash Join算法和Sort-merge Join算法,其基本思想是充分利用该架构新增的缓存结构来减少连接操作的cache缺失率。与CUDA stream技术相结合,新算法在输出结果较多时可以有效地隐藏主存与显存间数据传输带来的延迟,进一步提升其执行效率。实验结果证实了基于Fermi架构的Hash Join算法处理偏斜数据的高效性及Sort-merge Join算法的稳定性,并且通过比较表明,这两种算法的性能全面优于基于多核CPU充分优化的Join算法,最大加速2.4倍,在外键分布高偏斜时新的Hash Join算法的执行速度甚至达到每秒217M元组。
In column database, the join operation is still the most important and the most time-consuming operatton. GPUs' powerful computing capabilities can provide new optimizing solutions. This paper proposed a new Hash Join al- gorithm and a new Sort-merge Join which are both based on Fermi architecture. These two new implementation approa- ches take full advantage of the new parallel cache hierarchy of Fermi, and successfully reduce the cache miss rate of the join operation. Combined with CUDA stream technology, the new ioin algorithms can effectively hide the data transfer delay between the main memory and glohal memory, when the join operation generates a large number of results. The experimental results show that Hash Join based on Fermi performs better when it deals with data skew, while Sort- merge Join based on Fermi is more stable. Comparing with the implementations based on multi-core CPUs, these two al- gorithms are faster, speed up 2. 4 times at most, and the new Hash Join even achieves 217M tuples per second when fo- reign key's dataset is high skew.
出处
《计算机科学》
CSCD
北大核心
2013年第3期62-67,共6页
Computer Science
基金
广东省科技计划项目(2011A010801008
2011A090200122
2011A090200027)资助