摘要
针对低频(采样间隔大于1min)轨迹数据匹配算法精度不高的问题,提出了一种基于强化学习和历史轨迹的匹配算法HMDP-Q,首先通过增量匹配算法提取历史路径作为历史参考经验库;根据历史参考经验库、最短路径和可达性筛选候选路径集;再将地图匹配过程建模成马尔科夫决策过程,利用轨迹点偏离道路距离和历史轨迹构建回报函数;然后借助强化学习算法求解马尔科夫决策过程的最大回报值,即轨迹与道路的最优匹配结果;最后应用某市浮动车轨迹数据进行试验。结果表明:本文算法能有效提高轨迹数据与道路匹配精度;本算法在1min低频采样间隔下轨迹匹配准确率达到了89.2%;采样频率为16min时,该算法匹配精度也能达到61.4%;与IVVM算法相比,HMDP-Q算法匹配精度和求解效率均优于IVVM算法,16min采样频率时本文算法轨迹匹配精度提高了26%。
In order to improve the accuracy of low frequency (sampling interval greater than 1 minute) trajectory data matching algorithm, this paper proposed a novel matching algorithm termed HMDP-Q (History Markov Decision Processes Q-learning). The new algorithm is based on reinforced learning on historic trajectory. First, we extract historic trajectory data according to incremental matching algorithm as historical reference, and filter the trajectory dataset through the historic reference, the shortest trajectory and the reachability. Then we model the map matching process as the Markov decision process, and build up reward function using deflected distance between trajectory points and historic trajectories. The largest reward value of the Markov decision process was calculated by using the reinforced learning algorithm, which is the optimal matching result of trajectory and road. Finally we calibrate the algorithm by utilizing city's floating cars data to experiment. The results show that this algorithm can improve the accuracy between trajectory data and road. The matching accuracy is 89.2% within 1 minute low-frequency sampling interval, and the matching accuracy is 61.4% when the sampling frequency is 16 minutes. Compared with IVVM (Interactive Voting-based Map Matching), HMDP-Q has a higher matching accuracy and computing efficiency. Especially, when the sampling frequency is 16 minutes, HMDP-Q improves the matching accuracy by 26%.
出处
《测绘学报》
EI
CSCD
北大核心
2016年第11期1328-1334,共7页
Acta Geodaetica et Cartographica Sinica
基金
国家自然科学基金(41671383)~~
关键词
低频浮动车数据
轨迹匹配
马尔科夫决策过程
强化学习
low-sampling-rate floating car data
trajectory matching
Markov decision process
reinforcement learning