期刊文献+

基于深度强化学习算法的车辆行为决策研究

Research on vehicle behavior decision based on deep reinforcement learning algorithm
下载PDF
导出
摘要 针对传统的DDPG算法训练时间长和收敛速度慢的问题,提出一种将引导式学习和优选经验回放机制相结合融入DDPG的算法。改进的DDPG算法在训练初期的动作输出为引导式学习和策略网络共同作用的结果,后期引导式学习不再参与控制。同时引入经验池分离技术,将优势经验样本和劣势经验样本分开存储和固定比例随机抽取。在TORCS平台上进行车辆决策测试,结果表明,改进后的DDPG离,提高算法效率。 Aiming at the problems of long training time and slow convergence speed of the traditional DDPG algorithm,a combination of guided learning and optimal experience replay mechanism is proposed to be incorporated into the DDPG algo-rithm.The action output of the improved DDPG algorithm in the early stage of training is the combined effect of guided learning and strategy network,and later guided learning no longer participates in control.At the same time,the experience pool separation technology is introduced to separate the advantage experience samples and disadvantage experience samples and randomly select them at a fixed ratio.Through vehicle decision testing on the TORCS platform,which shows that the improved DDPG algorithm can effectively reduce training time,improve effective driving distance and improve algorithm ef-ficiency.
作者 陈名松 张泽功 吴冉冉 吴泳蓉 CHEN Mingsong;ZHANG Zegong;WU Ranran;WU Yongrong(School of Information and Communication,Guilin University of Electronic Technology,Guilin 541004,China)
出处 《桂林电子科技大学学报》 2022年第1期29-35,共7页 Journal of Guilin University of Electronic Technology
基金 认知无线电与信息处理教育部重点实验室主任基金(CRKH80102) 桂林电子科技大学研究生教育创新计划(2018YJCX29)。
关键词 深度确定性策略梯度算法 引导式学习 优选经验回放 TORCS deep deterministic policy gradient(DDPG)algorithm guided learning optimal experience replay TORCS
  • 相关文献

参考文献2

二级参考文献9

共引文献38

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部