期刊文献+

基于融合离散动作的双延迟深度确定性策略梯度算法的自动驾驶端到端行为决策方法 被引量:2

An End-to-end Decision-making Method for Autonomous Driving Based on Twin Delayed Deep Deterministic Policy Gradient with Discrete
下载PDF
导出
摘要 针对基于强化学习的车辆驾驶行为决策方法存在的学习效率低、动作变化不平滑等问题,研究了1种融合不同动作空间网络的端到端自动驾驶决策方法,即融合离散动作的双延迟深度确定性策略梯度算法(TD3WD)。在基础双延迟深度确定性策略梯度算法(TD3)的网络模型中加入1个输出离散动作的附加Q网络辅助进行网络探索训练,将TD3网络与附加Q网络的输出动作进行加权融合,利用融合后动作与环境进行交互,对环境进行充分探索,以提高对环境的探索效率;更新Critic网络时,将附加网络输出作为噪声融合到目标动作中,鼓励智能体探索环境,使动作值预估更加准确;利用预训练的网络获取图像特征信息代替图像作为状态输入,降低训练过程中的计算成本。利用Carla仿真平台模拟自动驾驶场景对所提方法进行验证,结果表明:在训练场景中,所提方法的学习效率更高,比TD3和深度确定性策略梯度算法(DDPG)等基础算法收敛速度提升约30%;在测试场景中,所提出的算法的收敛后性能更好,平均压线率和转向盘转角变化分别降低74.4%和56.4%。 There are issues for the decision support method for automated driving based on reinforcement learning,such as low learning efficiency and non-continuous actions. Therefore,an end-to-end decision-making method for autonomous driving is developed based on the Twin Delayed Deep Deterministic Policy Gradient with Discrete(TD3WD) algorithm,which can be used to fuse the information from different action spaces over a network. In the network of traditional Twin Delayed Deep Deterministic Policy Gradient(TD3) algorithm,an additional Q network that outputs discrete actions is added to assist exploration training. Weighted fusion of the output actions of TD3 network and additional Q network is performed. The fused actions interact with the environment,in order to fully explore the environment and enhance the efficiency of the environment exploration. When the Critic network is updated,the output of the attached network is merged into the target actions as noise to encourage the agent to explore the environment and obtain better action estimates. Instead of the original images,image feature obtained from the pre-trained network is used as the state input to reduce the computational cost in the training process. The proposed model is tested under a set of simulated autonomous driving scenarios generated by Carla simulation platform. The results show that the convergence speed of the proposed method is about 30% higher than that of traditional reinforcement learning algorithms like TD3 and Deep Deterministic Policy Gradient(DDPG)under the training scenarios. Under the testing scenarios,the proposed method shows better convergent performances and the average rate of lane-crossing and the change rate of steering angle are reduced by 74.4% and 56.4% respectively.
作者 杨璐 王一权 刘佳琦 段玉林 张荣辉 YANG Lu;WANG Yiquan;LIU Jiaqi;DUAN Yulin;ZHANG Ronghui(Tianjin Key Laboratory for Advanced Mechatronic System Design and Intelligent Control,School of Mechanical Engineering,Tianjin 300384,China;National Demonstration Center for Experimental Mechanical and Electrical Engineering Education,Tianjin University of Technology,Tianjin 300384,China;Institute of Agricultural Resources and Regional Planning,Chinese Academy of Agricultural Sciences,Beijing 100081,China;Guangdong Provincial Key Laboratory of Intelligent Transport System,Sun Yat-sen University,Guangzhou 510275,China)
出处 《交通信息与安全》 CSCD 北大核心 2022年第1期144-152,共9页 Journal of Transport Information and Safety
基金 中国农业科学院国际农业科学计划项目(CAAS-ZDRW202107) 国家自然科学基金项目(52172350、51775565) 天津市研究生科研创新项目(2020YJSZXS05)资助。
关键词 自动驾驶 端到端决策 深度强化学习 动作空间 autonomous driving end-to-end decision-making deep reinforcement learning action space
  • 相关文献

参考文献6

二级参考文献29

共引文献99

同被引文献27

引证文献2

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部