摘要
城市中大量的公交车、出租车和网约车每天服务着数以百万计的城市居民。这些车辆产生的海量GPS轨迹数据给交通大数据平台带来了沉重的存储成本压力,GPS轨迹数据的压缩处理迫在眉睫。文章借助大数据计算引擎Spark,以并行化的方式实现了一组经典轨迹压缩算法,这些算法包括Douglas-Pecuker,Top-Down Time-Ratio,Sliding Window和SQUISH,并在一个超大规模真实数据集上验证了该方法。实验结果表明该方法表现出很好的性能,31000辆车累计1个月共117.5 GB的GPS轨迹数据,在一个14节点组成Spark集群上仅需要438 s即可完成压缩。
A large number of buses,taxis and e-hailing vehicles in cities serves millions of residents every day.The massive scale GPS trajectories generated by these vehicles had brought heavy storage cost pressure to the operator.Thus,it is urgent to compress the large scale GPS trajectory data in an efficient way.This paper uses Spark,a big data computing engine,to implement a group of classical trajectory compression algorithms in a parallel way,these algorithms include Douglas-Pecuker,Top-Down Time-Ratio,Sliding Window and SQUISH.Then,we validate these algorithms on a very large GPS trajectories dataset,which contains 117.5 GB GPS trajectory data produced by 31000 vehicles in one month.The experimental results show that it takes only 438 s to compress the dataset in a Spark cluster with 14 nodes.
作者
李浩
熊文
柳大格
Li Hao;Xiong Wen;Liu Dage(College of Information Science and Technology,Yunnan Normal University,Kunming 650500,China)
出处
《江苏科技信息》
2023年第1期72-74,共3页
Jiangsu Science and Technology Information
基金
国家自然科学基金,项目名称:城市交通大数据平台基准测试和性能优化关键技术研究,项目编号:61862066。