期刊文献+

面向多表数据连接投影和连接顺序的优化方法 被引量:2

Optimization Method of Projection and Order for Multiple Tables Join
下载PDF
导出
摘要 多表连接运算是大数据处理中常见的运算。类似于数据库运算中常见的连接操作,多表连接运算的顺序会对计算资源和传输资源的消耗产生巨大影响。对多表连接顺序的优化是一个经典的优化问题,同时每次连接中表的投影结果大小也会影响节点间传输的数据体积,因此整体连接的顺序和每次连接的投影关系都会对连接效率产生显著的影响,而在传统的优化策略中,往往不会考虑到中间投影关系的取舍问题,以及基于中间投影关系而对最优连接策略产生的影响。针对这个问题,建立了一种连接关系索引,能够在构建优化连接策略中调整每次连接的投影关系,及时删除冗余列,减少对传输资源的消耗,同时基于投影关系的优化调整连接顺序的优化策略,从全局考量上尽可能地同时减少对传输资源和计算资源的消耗。该优化策略在Flink系统实现后进行了实验,结果表明有显著的优化效果。 Multiple tables join operation is a common operation in big data processing.Similar to the common Join operations in database operations,the order of multiple tables join operation will have a great impact on the consumption of computing resources and transmission resources.The optimization of the join order of multiple tables is a classical optimization problem,and the size of the projection result of the table in each join will also affect the data volume transmitted between nodes.Therefore,the overall connection order and the projection relationship of each connection will have a significant impact on the join efficiency.But in the traditional optimization strategy,the choice of intermediate projection relation,and the influence on the optimal join strategy based on the intermediate projection relation are often not considered.In order to solve this problem,this paper establishes a connection relation index,which can adjust the projection relation of each join in the construction optimization connection strategy,delete redundant columns in time,and reduce the consumption of transmission resources.At the same time,the optimization strategy of adjusting join order based on projection relation can reduce the consumption of transmission resources and computing resources as much as possible.After the implementation in the Flink system,the optimization strategy is tested,and the results show that it has a significant optimization effect.
作者 宗枫博 赵宇海 王国仁 季航旭 ZONG Fengbo;ZHAO Yuhai;WANG Guoren;JI Hangxu(School of Computer Science and Engineering,Northeastern University,Shenyang 110169,China;School of Computer Science and Technology,Beijing Institute of Technology,Beijing 100081,China)
出处 《计算机科学与探索》 CSCD 北大核心 2022年第1期106-119,共14页 Journal of Frontiers of Computer Science and Technology
基金 科技部国家重点研发计划(2018YFB1004402) 国家自然科学基金(61772124)。
关键词 大数据 连接优化 投影优化 big data join optimization project optimization
  • 相关文献

同被引文献8

引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部