期刊文献+

基于强化学习的多目标车辆跟随决策算法 被引量:8

Multi-objective vehicle following decision algorithm based on reinforcement learning
原文传递
导出
摘要 为满足自适应巡航系统跟车模式下的舒适性需求并兼顾车辆安全性和行车效率,解决已有算法泛化性和舒适性差的问题,基于深度确定性策略梯度算法(deep deterministic policy gradient, DDPG),提出一种新的多目标车辆跟随决策算法.根据跟随车辆与领航车辆的相互纵向运动学特性,建立车辆跟随过程的马尔可夫决策过程(Markov decision process, MDP)模型.结合最小安全距离模型,设计一个高效、舒适、安全的车辆跟随决策算法.为提高模型收敛速度,改进了DDPG算法经验样本的存储方式和抽取策略,根据经验样本重要性的不同,对样本进行分类存储和抽取.针对跟车过程的多目标结构,对奖赏函数进行模块化设计.最后,在仿真环境下进行测试,当测试环境和训练环境不同时,依然能顺利完成跟随任务,且性能优于已有跟随算法. To meet the comfort requirements of the adaptive cruise system following mode and take into account vehicle safety and driving efficiency, and solve the problem of poor generalization and comfort of existing algorithms, a new multi-target vehicle following decision is proposed based on the deep deterministic policy gradient(DDPG). According to the mutual longitudinal kinematics of the following vehicle and the pilot vehicle, a Markov decision process(MDP)model of the vehicle following process is established. Combined with the minimum safety distance model, an efficient,comfortable and safe vehicle following decision algorithm is designed. In order to improve the model convergence speed, the storage method and extraction strategy of the DDPG algorithm’s experience samples are improved, and the samples are classified and stored according to the importance of the experience samples. Aiming at the multi-objective structure of the following process, the reward function is modularized. Finally, the test is performed in the simulation environment. When the test environment and the training environment are different, the following tasks can be successfully completed, and the performance is better than the existing following algorithms.
作者 邓小豪 侯进 谭光鸿 万斌杨 曹婷婷 DEND Xiao-hao;HOU Jin;TAN Guang-hong;WAN Bin-yang;CAO Ting-ting(School of Information Science and Technology,Southwest Jiaotong University,Chengdu 611756,China)
出处 《控制与决策》 EI CSCD 北大核心 2021年第10期2497-2503,共7页 Control and Decision
基金 浙江大学CAD&CG国家重点实验室开放课题(A1923) 成都市科技项目(2015-HM01-00050-SF)。
关键词 自主决策 车辆跟随 半自动驾驶 强化学习 深度确定性策略梯度 马尔可夫决策过程 autonomous decision vehicle following semi-autonomous driving reinforcement learning deep deterministic policy gradient Markov decision process
  • 相关文献

参考文献5

二级参考文献36

  • 1侯德藻,刘刚,高锋,李克强,连小珉.新型汽车主动避撞安全距离模型[J].汽车工程,2005,27(2):186-190. 被引量:50
  • 2闵颖颖,刘允刚.Barbalat引理及其在系统稳定性分析中的应用[J].山东大学学报(工学版),2007,37(1):51-55. 被引量:104
  • 3任殿波,张继业,李维军.基于滑模控制的时滞自动车辆跟随系统数学模型[J].公路交通科技,2008,25(1):142-145. 被引量:5
  • 4RAZA H, LOANNOU P. Vehicle following control design for automated highway systems [ J ]. IEEE Trans on Control Systems, 1996, 16(6): 43-60.
  • 5VAHIDI A, ESKANDAIMAN A. Research advances in intelligent collision avoidance and adaptive cruise control [J ]. Intelligent IEEE Trans on Transportation Systems, 2003, 4(3) : 143-153.
  • 6RAJAMANI R. Vehicle dynamics and control[ M]. Second Edition. Heidelberg: Springer Science & Business Media, 2011,.
  • 7ZHANG J, IOANNOU P A. Longitudinal control of heavy trucks in mixed traffic: environmental and fuel economy considerations [ J ]. IEEE Transactions on Intelligent Transportation Systems, 2006, 7( 1): 92-104.
  • 8NARANJO J E, GONZALEZ C, REVIEJO J, et al. Adaptive fuzzy control for inter-vehicle gap keeping [ J ]. IEEE Transactions on Intelligent Trans Systems, 2003, 4(3) : 132-142.
  • 9JENNESS J W, LERNER N D, MAZOR S, et al. Use of advanced in-vehicle technology by young and older early adopters[ R]//Survey Results on Adaptive Cruise Control Systems. Washington, DC: National Highway Traffic Safety Administration, 2008.
  • 10MARTINEZ J J, de CANUDAS W C. A safe longitudinal control for adaptive cruise control and stop-and-go scenarios[J]. IEEE Transactions on Control Systems Technology, 2007, 15(2): 246-258.

共引文献302

同被引文献69

引证文献8

二级引证文献6

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部