摘要
为解决无人驾驶路径规划过程中的决策控制问题,文章针对深度确定性策略梯度算法在未知环境中随着搜索空间的增大,出现训练效率低、收敛不稳定等缺点,提出了基于奖励指导的改进算法。首先在每回合内采用基于奖励的优先级经验回放,减少深度确定性策略梯度算法随机探索的盲目性,提高智能车学习效率。然后在回合间基于奖励筛选优秀轨迹,便于指导智能车对复杂空间的探索,得到稳定的控制策略。最后,在开源智能驾驶仿真环境进行仿真。实验结果表明改进后的深度确定性策略梯度算法性能优于原来的算法,训练效率和收敛稳定性均得到有效提升。
In order to solve the problem of decision-making control in the process of unmanned path planning,in view of the deep deterministic policy gradient algorithm,there are defects such as low training efficiency and unstable convergence,with the increase of search space in unknown environments.An improved algorithm based on reward guidance is proposed.Firstly,prioritized experience replay based on reward is adopted in each round to reduce the blindness of random exploration of the deep deterministic policy gradient algorithm and improve the learning efficiency of the intelligent vehicle.Then,the excellent trajectory is selected based on reward between rounds to guide the intelligent vehicle to explore complex space and obtain a stable control strategy.Finally,the simulation is carried out in the open-source intelligent driving simulation environment.The experimental results show that the performance of the improved deep deterministic policy gradient algorithm is better than the original algorithm,and the training efficiency and convergence stability are effectively improved.
作者
陈建文
张小俊
张明路
CHEN Jianwen;ZHANG Xiaojun;ZHANG Minglu(School of Mechanical Engineering,Hebei University of Technology,Tianjin 300400)
出处
《汽车实用技术》
2022年第1期28-31,共4页
Automobile Applied Technology
关键词
路径规划
决策控制
深度确定性策略梯度
奖励指导
优先经验回放
Path planning
Decision control
Deep deterministic policy gradient
Reward guidance
Prioritized experience replay