期刊文献+

基于深度强化学习的无信号灯路口决策研究

Research on Decision-Making at Intersection Without Traffic Lights Based on Deep Reinforcement Learning
下载PDF
导出
摘要 无信号灯左转路口是自动驾驶场景中最为危险的场景之一,如何实现高效安全的左转决策是自动驾驶领域的重大难题。深度强化学习(DRL)算法在自动驾驶决策领域具有广阔应用前景。但是,深度强化学习在自动驾驶场景中存在样本效率低、奖励函数设计困难等问题。提出一种基于专家先验的深度强化学习算法(CBAMBC SAC)来解决上述问题。首先,利用SMARTS仿真平台获得专家先验知识;然后,使用通道-空间注意力机制(CBAM)改进行为克隆(BC)方法,在专家先验知识的基础上预训练模仿专家策略;最后,使用模仿专家策略指导深度强化学习算法的学习过程,并在无信号灯路口左转决策中进行验证。实验结果表明,基于专家先验的DRL算法比传统的DRL算法更具优势,不仅可以免去人为设置奖励函数的工作量,而且可以显著提高样本效率从而获得更优性能。在无信号灯路口左转场景下,CBAM-BC SAC算法与传统DRL算法(SAC)、基于传统行为克隆的DRL算法(BC SAC)相比,平均通行成功率分别提高了14.2和2.2个百分点。 Left-turn intersections without signal lights are among the most dangerous scenes in autonomous driving,and achieving efficient and safe left-turn decision-making is highly challenging in autonomous driving.The Deep Reinforcement Learning(DRL)algorithm has broad prospects in autonomous driving decision-making.However,its sample efficiency is low and it cannot be used to easily design reward functions in autonomous driving.Therefore,a DRL algorithm based on expert priors,abbreviated as CBAM-BC SAC,is proposed to solve the aforementioned problems.First,a Scalable Multiagent RL Training School(SMARTS)simulation platform is used to obtain expert prior knowledge.Subsequently,a Convolutional Block Attention Module(CBAM)is used to improve Behavior Cloning(BC),which pretrains and imitates expert strategies based on the prior knowledge of experts.Finally,the learning process of the DRL algorithm is guided by an imitation expert strategy and verified in a left-turn decision-making at intersection without traffic lights.Experimental results indicate that the DRL algorithm based on expert prior is more advantageous than conventional DRL algorithms.It not only eliminates the workload of manually setting reward functions,but also significantly improves sample efficiency and achieves better performance.In left-turn scene at intersection without traffic lights,the CBAM-BC SAC algorithm improves the average traffic success rate by 14.2 and 2.2 percentage points,respectively,compared with the conventional DRL algorithm SAC and the DRL algorithm BC SAC based on classic BC.
作者 傅明建 郭福强 FU Mingjian;GUO Fuqiang(College of Computer and Data Science,Fuzhou University,Fuzhou 350108,Fujian,China)
出处 《计算机工程》 CAS CSCD 北大核心 2024年第5期91-99,共9页 Computer Engineering
基金 福建省自然科学基金(2022J01117) 福建省中青年教师教育科研项目(JAT200007) 福州大学人才启动基金(XRC-23055)。
关键词 深度强化学习 自动驾驶 模仿学习 行为克隆 驾驶决策 Deep Reinforcement Learning(DRL) autonomous driving imitation learning Behavioral Cloning(BC) driving decision-making
  • 相关文献

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部