摘要
为提高车道保持算法的成功率,增强无人车导航能力,提出了一种基于改进的近端策略优化算法(Proxi-mal Policy Optimization,PPO)的端到端车道保持算法研究。通过将PPO算法中的一个隐藏层替换为LSTM网络及重新设计奖励函数创建端到端的车道保持算法框架,该框架可以将用于训练的算法策略与模拟器相结合,框架以车前方摄像头的RGB图像、深度图像、无人车的速度、偏离车道线值与碰撞系数等无人车周围环境变量为输入,以车前方摄像头的油门、刹车、方向盘转角等无人车周围环境变量为输出。在Airsim仿真平台下不同的地图中进行训练与测试,并与原算法进行对比实验。实验结果证明改进的LSTM-PPO算法能够训练出有效的车道保持算法,改进后的算法能显著减少训练时间并增加算法的鲁棒性。
To improve the success rate of unmanned driving and enhance the navigation ability of unmanned vehicles,this paper proposes an end-to-end lane keeping algorithm based on an improved Proximal Policy Optimization(PPO)algorithm.This article cre⁃ates an end-to-end unmanned driving framework by replacing a hidden layer in the PPO algorithm with an LSTM network and rede⁃signing a reward function.The framework can combine algorithm strategies for training with simulators.The framework takes RGB im⁃ages,depth images,unmanned vehicle speed,lane departure values,and collision coefficients of the camera in front of the vehicle as in⁃puts,and takes throttle,brake The environment variables around unmanned vehicles such as steering wheel angle are outputs.Train and test on different maps on the Airsim simulation platform,and conduct comparative experiments with the original algorithm.The ex⁃perimental results demonstrate that the improved LSTM-PPO algorithm can train effective autonomous driving algorithms,and the im⁃proved algorithm can significantly reduce training time and increase the robustness of the algorithm.
作者
宋建辉
崔永阔
SONG Jianhui;CUI Yongkuo(Shenyang Ligong University,Shenyang 110159)
出处
《通信与信息技术》
2024年第3期92-97,共6页
Communication & Information Technology
基金
辽宁省教育厅高等学校基本科研项目(项目编号:LJKZ0275)
沈阳市中青年科技创新人才支持计划项目(项目编号RC210247)。
关键词
自动驾驶
强化学习
近端策略优化
长短期记忆网络
Autonomous driving
Reinforcement learning
Near end strategy optimization
Long and short term memory network