摘要
在应对激增的空间数据时,空间聚集查询是一类有效的分析方法。当前,传统单机串行方法已经难以胜任在线分析需求,然而并行可扩展的计算架构中专门针对空间数据的聚集索引技术尚未有很多研究。因此,提出两种新的索引方法以支持空间在线并行聚集分析。第一种索引方法中,并行的两级空间索引结构提升了精确聚集查询效率。在此基础上构建随机采样样本并优化得到第二种索引方法,在任意给定置信度下能够反馈带有置信区间聚集查询结果,且精度随着获取样本的增加不断提高。10亿级规模数据实验结果表明该方法有效可行,还有一定的可扩展性。
While coping with the soaring spatial data,spatial aggregation proves to be competent and efficient,though it can be compute-intensive.In terms of spatial online aggregation,traditional stand-alone serial methods gradually become limited.However,the current parallel computing architectures widely used nowadays,scarcely have research conducted on the index-based parallel online aggregation methods specifically for spatial data.Therefore,two new indexes-based methods are proposed to support spatial online aggregation analysis.In the first method,indexes are organized in two-layers,where the global grid index filters the related local indexes and the local indexes accelerate the aggregate query locally.In the second method,on the basis of the first method,the random sampling,adaptive data-bricks partition,dynamic caching,and other optimization techniques are all applied.In this way,when given certain confidence,the final results are returned with certain credit intervals.Experimental and analytical results on billion-scale data verify the effectiveness and scalability of those methods.
作者
申金鑫
吴烨
陈荦
景宁
SHEN Jinxin;WU Ye;CHEN Luo;JING Ning(College of Electronic Science,National University of Defense Technology,Changsha 410073,China)
出处
《计算机科学与探索》
CSCD
北大核心
2018年第10期1559-1570,共12页
Journal of Frontiers of Computer Science and Technology
基金
国家自然科学基金No.41471321
国家高技术研究发展计划(863计划)No.2015AA123901~~
关键词
聚集计算
近似查询
空间索引
在线分析
aggregation computation
approximate query
spatial index
online analysis