期刊文献+

大规模时空轨迹数据连接查询效率优化实践 被引量:1

Practice of Improving Join Query Efficiency for Large Scale Spatiotemporal Trajectory Data
下载PDF
导出
摘要 本文提出一种低集群计算资源条件下,大规模轨迹类数据同时空关系的快速连接查询算法DPCP-CROSSJOIN.该算法通过对轨迹数据时间字段进行分段交叉编码和位置网格化等方式对连续的轨迹数据离散化,并以日期和网格区域编码进行两级分区存储.通过交叉“等值”连接查询,实现时空连接查询的三级索引、四级加速,将n·n对象间同时空关系连接查询时间复杂度从O(n^(2))降为O(nlogn).在Hadoop集群上使用Hive和TEZ等进行大规模轨迹数据连接查询时能将连接查询效率最高提升到30.66倍.该算法以时间段编码作为关联条件,巧妙绕开连接过程中复杂表达式的实时计算,以“等值”替代复杂表达式计算连接,提高MapReduce任务并行度,提升集群存储和计算资源利用率.在面对仅使用一般优化已几乎无法完成的,更大规模类似任务,仍能在数分钟内完成.实验表明,该算法具有高效和稳定等特性,尤其适用低“算力”资源条件下大规模轨迹数据的同时空关系连接查询.此方法还可作为时空轨迹伴随查找,对象间关系亲密度判定等的原子算法,可广泛应用于维护国家安全、社会治安秩序,预防和打击犯罪,辅助城乡规划统筹等领域. This study proposes an algorithm named DPCP-CROSS-JOIN for fast co-spatiotemporal relationship join queries of large-scale trajectory data in insufficient cluster computing resource environments.The proposed algorithm discretizes continuous trajectory data by segmenting and cross-coding the temporal fields of trajectory data and conducting spatiality gridded coding and then stores the data in two-level partitions using date and grid region coding.It achieves 3-level indexing and 4-level acceleration for spatiotemporal join queries through cross “equivalent” join queries.As a result,the time complexity of the co-spatiotemporal relationship join queries among n· n objects is reduced from O(n~2) to O(nlogn).It can improve the efficiency of join queries by up to 30.66 times when Hive and TEZ are used on a Hadoop cluster for join queries of large-scale trajectory data.This algorithm uses time-slice and gridding coding as the join condition,thereby cleverly bypassing the real-time calculation of complex expressions during the join process.Moreover,complex expression calculation join is replaced with “equivalent” join to improve the parallelism of MapReduce tasks and enhance the utilization rates of cluster storage and computing resources.Similar tasks of larger scales of trajectory data that are almost impossible to accomplish using general optimization methods can still be completed by the proposed algorithm within a few minutes.The experimental results suggest that the proposed algorithm is efficient and stable,and it is especially suitable for the co-spatiotemporal relationship join queries of large-scale trajectory data under insufficient computing resource conditions.It can also be used as an atomic algorithm for searching accompanying spatiotemporal trajectories and determining the intimacy of relationships among objects.It can be widely applied in fields such as national security and social order maintenance,crime prevention and combat,and urban and rural planning support.
作者 丁强龙 叶惠珠 袁弘强 李志新 DING Qiang-Long;YE Hui-Zhu;YUAN Hong-Qiang;LI Zhi-Xin(Detachment of Science and Information Technology,Kunming Public Security Bureau,Kunming 650217,China;The College of Arts and Sciences·Kunming,Kunming 650221,China;Intelligence and Command Center,Kunming Public Security Bureau,Kunming 650217,China)
出处 《计算机系统应用》 2024年第5期1-14,共14页 Computer Systems & Applications
基金 公安部应用创新计划(2019YYCXYNST019)。
关键词 轨迹数据 三级时空索引 复杂表达式连接查询 交叉编码 同时空 低算力条件 trajectory data 3-level spatio-temporal index complex expression join query cross coding co spatio-temporal low computing power condition
  • 相关文献

参考文献9

二级参考文献46

  • 1Ghemawat S, Gobioff H, Leung ST. The Google file system. In: Proc. of the SOSP 2003. 2003.20-43. [doi: 10.1145/1165389. 945450].
  • 2Dean J, Ghemawat S. MapReduce: Simplified data processing on large clusters. In: Proc. of the OSDI 2004. 2004. 137-150. [doi: 10.1145/1327452.1327492].
  • 3Yang HC, Dasdan A, Hsiao RL, Parker DS. Map-Reduce-Merge: Simplified relational data processing on large cluster. In: Proc. of the SIGMOD 2007. 2007. 1029-1040. [doi: 10.1145/1247480.1247602].
  • 4Lammel R. Google's MapReduce programming model Revisited. Science Computer Program, 2008,70(1):1-30. [doi: 10.1016/ j .scico .2007.07.001 ].
  • 5Thusoo A, Sarma JS, Jain N, Shao Z, Chakka P, Anthony S, Liu H, Wyckoff P, Murthy R. Hi:ce: A warehousing solution over a map-reduce framework. Proc. of the VLDB Endowment, 2009,2(2): 1626-1627.
  • 6Thusoo A, Sarma JS, Jain N, Shao Z, Chakka P, Zhang N, Antony S, Liu H, Murthy R. Hive--A petabyte scale data warehouse using Hadoop data engineering. In: Proc. of the ICDE. 2010. 996-1005. [doi: 10.1109/ICDE.2010.5447738].
  • 7Olston C, Reed B, Sirvastava U, Kumar R, Tomkins A. Pig Latin: A not-so-foreign language for data processing. In: Proc. of the SIGMOD. 2008. 1099-1110. [doi: 10.1145/1376616.1376726].
  • 8White T. Hadoop: The Definitive Guide. O'Reilly, 2009.
  • 9Apache Hadoop. http://hadoop.apache.org/.
  • 10Murty J. Programming Amazon Web Services: S3, EC2, SQS, FPS, and SimpleDB. O'Reilly, 2008.

共引文献237

同被引文献3

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部