期刊文献+

一种基于Spark的多路空间连接查询处理算法

A Multi-Way Spatial Join Querying Processing Algorithm Based on Spark
下载PDF
导出
摘要 针对云环境下空间数据连接查询处理问题,提出了一种基于Spark的多路空间连接查询处理算法BSMWSJ.该算法采用网格划分方法将整个数据空间划分成大小相同的网格单元,并将各类数据集中的空间对象,根据其空间位置划分到相应的网格单元中,不同网格单元中的空间数据对象进行并行连接查询处理.在多路空间连接查询处理过程中,采用边界过滤的方法来过滤无用数据,即通过计算前面连接操作候选结果的MBR来过滤后续连接数据集,从而过滤掉无用的连接对象,减少连接对象的多余投影与复制,并采用重复避免策略来减少重复结果的输出,从而进一步减少后续连接计算的代价.合成数据集和真实数据集上的大量实验结果表明:提出的多路空间连接查询处理算法在性能上明显优于现有的多路连接查询处理算法. Aiming at the problem of spatial join query processing in cloud computing systems, a multi-way In th spatial join query processing algorithm BSMWSJ is proposed, which is based on Spark platform. is algorithm, the whole data space is divided into grid cells with the same size by grid partition od, and spatial objects in each type data set are distributed into these grid cells according to their spatial locations. Spatial objects in different grid cells are processed in parallel. In multi-way spatial join query processing, a boundary filtering method is proposed to filter the useless data, which calculates the MBRs of the candidate results generated by the previous join processing, and uses these MBRs to filter the subsequent join data sets. This allows it to filter out the useless spatial objects, and reduce the redundant projection and replication of spatial objects. At the same time, a duplication avoidance strategy is applied to reduce the outputs of redundant results, and further minimizes the cost of the subsequent join processing. Many experiments on synthetic and real data sets show that the proposed multi-way spatial join query processing algorithm BSMWSJ has obvious advantages and better performance than the existing multi-way spatial join query processing algorithms.
出处 《计算机研究与发展》 EI CSCD 北大核心 2017年第7期1592-1602,共11页 Journal of Computer Research and Development
基金 国家自然科学基金项目(61073063 61332006) 国家海洋公益性行业科研专项经费项目(201105033)~~
关键词 云计算 Spark平台 多路空间连接查询 边界过滤 重复避免 cloud computing Spark platform multi-way spatial join query boundary filtering duplication avoidance
  • 相关文献

参考文献5

二级参考文献62

  • 1姜素芳,陈天滋.空间连接优化方法的研究[J].计算机工程,2007,33(2):90-93. 被引量:2
  • 2WHITET.Hadoop权威指南[M].北京:清华大学出版社.2010.5.
  • 3江务学,张塬,王志明,等.MapReduce并行编程架构模型研究[J].微电子学与计算,20t0,27(6):168-170.
  • 4Jiang Dawei, Tung A K H, Chen Gang. MAP- JOIN- RE- DUCE :Toward Scalab|e and Efficient Data Analysis on LargeClusters[ J]. IEEE Transactions on Knowledge and Data Engi- neering,2011,23 (9) : 1299-1311.
  • 5Lamel R. Google' s MapReduce Programming Model-Revisi- ted[ J]. Science of Computer Programming,2008,7 (1) :208- 237.
  • 6Ghemawat S, Gobioff H, Leung Shun-Tak. The Google file sys- tem[ J]. ACM SIGOPS Operating Systems Review,2003,37 ( 5 ) :29-43.
  • 7Chang F, Dean J, Ghemawat S, et al. A distributed storage sys- tem for structed data[ J ]. ACM Transactions on Computer Sys- tem ,2008,26 (2) : 1-26.
  • 8Abadi D J. Query execution in column-oriented database systems [D]. Cambridge: Massachusetts Institute of Technology, 2008.
  • 9Stonebraker M, Abadi D J, et al. C-Store: A column?oriented DBMS [C]//Proc of the 31st VLDB. New York: ACM. 2005: 553-564.
  • 10Dominik S, Jakub W, Victoria E, et al. Brighthouse , An analytic data warehouse for ad hoc queries [C]/ /Proc of Int Conf on Very Large Data Bases 2008. New York: ACM, 2008: 1337-1345.

共引文献17

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部