期刊文献+

面向分布式列式存储的轨迹大数据k近邻查询 被引量:7

kNN Query Processing for Trajectory Big Data Based on Distributed Column-Oriented Storage
原文传递
导出
摘要 针对轨迹大数据的高效点-轨迹k近邻(point to trajectory k nearest neighbor, P2Tk NN)查询处理需求,提出了一种融合时空剖分和轨迹分段的轨迹组织方法,其核心思想是在对轨迹作时间剖分的基础上,利用离散全球网格系统(discrete global grid system, DGGS)在空间上进行再次剖分,从而利用两次剖分得到的时空单元编码来索引落入其中的轨迹片段。在此基础上利用分布式列式存储技术设计了面向轨迹大数据的P2Tk NN查询处理框架,提出了一种顾及轨迹数据空间分布的自适应空间单元搜索算法,即通过分析轨迹数据在给定时间约束下的空间分异特征,动态调整空间单元的搜索步长,从而提升了轨迹稀疏区域的处理效率。针对亿级轨迹的实验结果表明,该方法适用于轨迹大数据的P2Tk NN查询处理,在轨迹稠密与稀疏区域的平均查询响应时间均小于1 s。 Objectives: With the advancement of mobile object tracking technologies, huge volumes of spatiotemporal data have been accumulated in the form of location trajectories. Such data bring us new opportunities and challenges in efficient trajectory retrieval. Among them, point to trajectory k nearest neighbor(P2 Tk NN) query plays a pivotal role in many mobile object-oriented analyses and applications. It has been widely used in tracing the source of infectious diseases, recommending travel routes, and generating and updating local road networks based on trajectories.Methods: Aiming at the requirement of efficient P2 Tk NN query processing for large-scale trajectory data, a trajectory organization method combining spatiotemporal segmentation and trajectory segmentation is proposed. The core idea is to use the discrete global grid system(DGGS) to segment the trajectory in spatial based on the previous time division. The spatiotemporal cell encoding is applied to index the trajectory segments falling in it. In order to realize efficient encoding of spatiotemporal cells, three concatenated encoding structures of spatiotemporal divide and conquer are proposed to solve the huge difference in dimension and scale between temporal dimension and spatial dimension. Based on above, a P2 Tk NN query processing framework for distributed column-oriented storage is designed, and an adaptive spatial cell search algorithm that takes into account the spatiotemporal distribution of trajectory data is proposed. The step size for spatial cell searching is adaptively set according to the spatial stratified heterogeneity of the trajectory data under the given time constraint, which greatly improves the search efficiency of areas with sparse trajectory data.Results: Experimental results based on billion-level trajectories show that the method is suitable for P2 Tk NN query processing of big trajectory data.(1) Different encoding structures are suitable for different query scenarios. Appropriate encoding structure can be used to make query response time less than 1 s.(2) Adaptive spatial cell search algorithm can significantly improve query efficiency, making the average query response delay of random one thousand queries less than 1 second on both dense and sparse areas.(3) The spatial cell.s k rate has an important influence on the query. After the k rate reaches 0.5, the query response time tends to balance.Conclutions: We propose a nearest neighbor query processing framework and its search algorithm for distributed column-oriented storage.Firstly,the framework organizes the trajectory in segments based on the spatiotemporal cells,and uses concatenated spatiotemporal encoding to realize spatiotemporal cluster storage of trajectory segments.Secondly, an algorithm that can perceive the spatial differentiation characteristics of trajectory data under a given time constraint and dynamically adjust the search step size of the spatial cell is designed, which greatly improves the processing efficiency of the data sparse area. The adaptive search strategy is applicable to any DGGS based on recursive subdivision, and it has strong scalability and can seamlessly connect with other subdivision frameworks, such as S2 Geometry, H3, etc.
作者 余列冰 向隆刚 孙尚宇 关雪峰 吴华意 YU Liebing;XIANG Longgang;SUN Shangyu;GUAN Xuefeng;WU Huayi(State Key Laboratory of Information Engineering in Surveying,Mapping and Remote Sensing,Wuhan University,Wuhan 430079,China)
出处 《武汉大学学报(信息科学版)》 EI CAS CSCD 北大核心 2021年第5期736-745,共10页 Geomatics and Information Science of Wuhan University
基金 国家重点研发计划(2018YFB2100603) 国家自然科学基金(41771474,41930107)。
关键词 轨迹大数据 K近邻查询 时空编码 自适应搜索 分布式列式存储 trajectory data k nearest neighbor query spatiotemporal encoding adaptive search distributed column-oriented storage
  • 相关文献

参考文献4

二级参考文献34

共引文献248

同被引文献115

引证文献7

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部