期刊文献+

OceanBase中基于布隆过滤器的连接算法 被引量:1

A join algorithm based on bloom filter in OceanBase
下载PDF
导出
摘要 在大数据时代,"去IOE"运动的推进以及"双11"等活动的兴起对分布式数据库系统提出了更高的要求.OceanBase是阿里巴巴集团自主研发的开源分布式数据库,支持海量数据跨行跨表事务,但是对复杂查询的处理性能仍有待提高,其中连接操作带来的网络传输严重影响了数据库的性能.本文提出了一种基于布隆过滤器的连接算法,通过构建布隆过滤器对右表数据进行过滤,减少了不必要的数据传输开销,降低了数据处理带来的内存资源的消耗.本文在OceanBase上实现了该算法,并通过实验证明,该算法极大提高了连接操作的效率. In the era of big data, the movement of "de-IOE" campaign and the development of activities such as Double 11 have put forward higher request of the performance of distributed database. OceanBase is an open sourced distributed database implemented by Alibaba. It supports for cross-table relational query of massive data but the performance for complex queries remains to be improved. The network transmission overheads caused by join operator seriously influenced the performance of distributed database. This paper proposes a join algorithm based on bloom filter. It filters the data of the right table by constructing a bloom filter on the join column of the left table. The key point of this algorithm is that it reduces the overhead of unnecessary data transmission and the consumption of memory resources by data processing. We implement this algorithm in OceanBase and the experiment results show that the algorithm can greatly improve the efficiency of join operator.
出处 《华东师范大学学报(自然科学版)》 CAS CSCD 北大核心 2016年第5期67-74,102,共9页 Journal of East China Normal University(Natural Science)
基金 国家863计划项目(2015AA015307)
关键词 OceanBase 连接操作 布隆过滤器 OceanBase join operation bloom filter
  • 相关文献

参考文献10

  • 1BLASGEN M W, ESWARAN K P. Storage and access in relational data bases[J]. IBM Systems Journal, 1977, 16(4): 363-377.
  • 2MERRETT T H. Why sort-merge gives the best implementation of the naturM join[J]. ACM SIGMOD Record, 1983, 13(2): 39-51.
  • 3BABB E. Implementing a relational database by means of specialized hardware[J]. ACM Transactions on Database Systems, 1979, 4(1): 1-29.
  • 4SCHNEIDER D A, DEWITT D J. A performance evaluation of four parallel join algorithms in a shared-nothing multiprocessor environment[C]//Proceedings of the 1989 ACM SIGMOD International Conference on Manage- ment of Data. ACM. 1989: 110-121.
  • 5BERNSTEIN P A, GOODMAN N, WONG E, et al. Query processing in a system for distributed databases (SDD-1)[J]. ACM Transactions on Database Systems, 1981, 6(4): 602-625.
  • 6BLOOM B H. Space/time trade-offs in hash coding with allowable errors[J]. Communications of the ACM, 1970, 13(7): 422-426.
  • 7CHEN M S, HSIAO H I, YU P S. On applying hash filters to improving the execution of multi-join queries[J]. The VLDB journal, 1997, 6(2): 121-131.
  • 8MACKERT L F, Lohman G M. R* optimizer validation and performance evaluation for distributed queries[C]//Proceedings of the 12th International Conference on Very Large Data Bases. San Francisco: Morgan Kaufmann Publishers Inc, 1986: 149-159.
  • 9BACON D F, STROM R E, TARAFDAR A. Guava: A dialect of Java without data races[C]//Proceedings of the 15th ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications. 2000: 382-400.
  • 10GHEMAWAT S, DEAN J. Level Dt3[DB/OL]. [2011-5-12]. http://code.google.com/p/leveldb/.

同被引文献3

引证文献1

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部