期刊文献+

改进深度确定性策略梯度的决策算法研究 被引量:1

Research on Improved Decision Algorithm of Deep Deterministic Policy Gradient
下载PDF
导出
摘要 为解决无人驾驶路径规划过程中的决策控制问题,文章针对深度确定性策略梯度算法在未知环境中随着搜索空间的增大,出现训练效率低、收敛不稳定等缺点,提出了基于奖励指导的改进算法。首先在每回合内采用基于奖励的优先级经验回放,减少深度确定性策略梯度算法随机探索的盲目性,提高智能车学习效率。然后在回合间基于奖励筛选优秀轨迹,便于指导智能车对复杂空间的探索,得到稳定的控制策略。最后,在开源智能驾驶仿真环境进行仿真。实验结果表明改进后的深度确定性策略梯度算法性能优于原来的算法,训练效率和收敛稳定性均得到有效提升。 In order to solve the problem of decision-making control in the process of unmanned path planning,in view of the deep deterministic policy gradient algorithm,there are defects such as low training efficiency and unstable convergence,with the increase of search space in unknown environments.An improved algorithm based on reward guidance is proposed.Firstly,prioritized experience replay based on reward is adopted in each round to reduce the blindness of random exploration of the deep deterministic policy gradient algorithm and improve the learning efficiency of the intelligent vehicle.Then,the excellent trajectory is selected based on reward between rounds to guide the intelligent vehicle to explore complex space and obtain a stable control strategy.Finally,the simulation is carried out in the open-source intelligent driving simulation environment.The experimental results show that the performance of the improved deep deterministic policy gradient algorithm is better than the original algorithm,and the training efficiency and convergence stability are effectively improved.
作者 陈建文 张小俊 张明路 CHEN Jianwen;ZHANG Xiaojun;ZHANG Minglu(School of Mechanical Engineering,Hebei University of Technology,Tianjin 300400)
出处 《汽车实用技术》 2022年第1期28-31,共4页 Automobile Applied Technology
关键词 路径规划 决策控制 深度确定性策略梯度 奖励指导 优先经验回放 Path planning Decision control Deep deterministic policy gradient Reward guidance Prioritized experience replay
  • 相关文献

参考文献1

二级参考文献8

共引文献479

同被引文献9

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部