基于深度强化学习的无信号灯路口决策研究

Research on Decision-Making at Intersection Without Traffic Lights Based on Deep Reinforcement Learning

下载PDF

导出

摘要无信号灯左转路口是自动驾驶场景中最为危险的场景之一,如何实现高效安全的左转决策是自动驾驶领域的重大难题。深度强化学习(DRL)算法在自动驾驶决策领域具有广阔应用前景。但是,深度强化学习在自动驾驶场景中存在样本效率低、奖励函数设计困难等问题。提出一种基于专家先验的深度强化学习算法(CBAMBC SAC)来解决上述问题。首先,利用SMARTS仿真平台获得专家先验知识;然后,使用通道-空间注意力机制(CBAM)改进行为克隆(BC)方法,在专家先验知识的基础上预训练模仿专家策略;最后,使用模仿专家策略指导深度强化学习算法的学习过程,并在无信号灯路口左转决策中进行验证。实验结果表明,基于专家先验的DRL算法比传统的DRL算法更具优势,不仅可以免去人为设置奖励函数的工作量,而且可以显著提高样本效率从而获得更优性能。在无信号灯路口左转场景下,CBAM-BC SAC算法与传统DRL算法(SAC)、基于传统行为克隆的DRL算法(BC SAC)相比,平均通行成功率分别提高了14.2和2.2个百分点。 Left-turn intersections without signal lights are among the most dangerous scenes in autonomous driving,and achieving efficient and safe left-turn decision-making is highly challenging in autonomous driving.The Deep Reinforcement Learning(DRL)algorithm has broad prospects in autonomous driving decision-making.However,its sample efficiency is low and it cannot be used to easily design reward functions in autonomous driving.Therefore,a DRL algorithm based on expert priors,abbreviated as CBAM-BC SAC,is proposed to solve the aforementioned problems.First,a Scalable Multiagent RL Training School(SMARTS)simulation platform is used to obtain expert prior knowledge.Subsequently,a Convolutional Block Attention Module(CBAM)is used to improve Behavior Cloning(BC),which pretrains and imitates expert strategies based on the prior knowledge of experts.Finally,the learning process of the DRL algorithm is guided by an imitation expert strategy and verified in a left-turn decision-making at intersection without traffic lights.Experimental results indicate that the DRL algorithm based on expert prior is more advantageous than conventional DRL algorithms.It not only eliminates the workload of manually setting reward functions,but also significantly improves sample efficiency and achieves better performance.In left-turn scene at intersection without traffic lights,the CBAM-BC SAC algorithm improves the average traffic success rate by 14.2 and 2.2 percentage points,respectively,compared with the conventional DRL algorithm SAC and the DRL algorithm BC SAC based on classic BC.

作者傅明建郭福强 FU Mingjian;GUO Fuqiang(College of Computer and Data Science,Fuzhou University,Fuzhou 350108,Fujian,China)

机构地区福州大学计算机与大数据学院

出处《计算机工程》 CAS CSCD 北大核心 2024年第5期91-99,共9页 Computer Engineering

基金福建省自然科学基金(2022J01117) 福建省中青年教师教育科研项目(JAT200007) 福州大学人才启动基金(XRC-23055)。

关键词深度强化学习自动驾驶模仿学习行为克隆驾驶决策 Deep Reinforcement Learning(DRL) autonomous driving imitation learning Behavioral Cloning(BC) driving decision-making

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

1王浩杰,陶冶,鲁超峰.基于碰撞预测的强化模仿学习机器人导航方法[J].计算机工程与应用,2024,60(10):341-352.
2唐斌,刘光耀,江浩斌,田宁,米伟,王春宏.基于柔性演员-评论家算法的决策规划协同研究[J].交通运输系统工程与信息,2024,24(2):105-113.
3张芳,倪守娟,颜艳.基于改进强化学习的无线通信网络传输安全态势感知方法[J].通信电源技术,2024,41(6):195-197.
4黄险峰,刘姗姗.政府公共转移支付的福利效应——基于风险分担的视角[J].地方财政研究,2024(2):40-53.
5赵征,刘子涵.基于深度强化学习的SCR脱硝系统协同控制策略研究[J].动力工程学报,2024,44(5):802-809.
6刘盾,高璐玥.基于机器学习的三支决策研究综述[J].陕西师范大学学报（自然科学版）,2024,52(3):11-25.
7夏子叶,霍国庆.智库数字化转型与数字智库[J].智库理论与实践,2024,9(2):22-28.
8赵恬恬,孔建国,梁海军,刘晨宇.未知环境下基于Dueling DQN的无人机路径规划研究[J].现代计算机,2024,30(5):37-43.
9荣子鸣.基于深度强化学习的汽车零件生产车间AGV节能调度算法[J].现代计算机,2024,30(5):81-86.
10吕美妮,甘辉,吴丽媚,唐鑫.一种全自动电控显微镜快速自动聚焦方法[J].电脑知识与技术,2024,20(10):15-19.

计算机工程

2024年第5期

浏览历史

内容加载中请稍等...

基于深度强化学习的无信号灯路口决策研究

相关作者

相关机构

相关主题

浏览历史