期刊文献+

基于改进PPO算法的机器人局部路径规划 被引量:4

Local Path Planning of Robot Based on Improved PPO Algorithm
下载PDF
导出
摘要 利用强化学习训练机器人局部路径规划模型存在算法收敛速度慢、易陷入死锁区域导致目标不可达等问题。对传统近端策略优化(PPO)算法进行改进,引入长短期记忆(LSTM)神经网络并设计虚拟目标点法,提出LSTM-PPO算法。将PPO神经网络结构中的全连接层替换为LSTM记忆单元,控制样本信息的记忆和遗忘程度,优先学习奖励值高的样本,从而更快地累积奖励优化模型。在此基础上,加入虚拟目标点,通过雷达传感器收集的环境信息判断机器人陷入死锁区域时弃用目标点给予机器人的引导,使机器人走出陷阱区域并趋向目标点,减少在死锁区域不必要的训练。分别在特殊障碍物场景和混合障碍物场景中对LSTM-PPO算法进行仿真验证,结果表明,与传统PPO算法和改进算法SDAS-PPO相比,该算法在两种场景训练中均能最快到达奖励峰值,可加快模型收敛速度,减少冗余路段,优化路径平滑度并缩短路径长度。 The traditional reinforcement learning algorithm has the problem of slow convergence and fails to reach the target owing to the possibility of falling into the deadlock area.Thus,based on the Proximal Policy Optimization(PPO)algorithm combined with a Long Short-Term Memory(LSTM)neural network and designed virtual target point method,this study introduces a LSTM-PPO algorithm.In this algorithm,the fully connected layer in the PPO neural network structure is replaced with an LSTM memory unit to control the memory and forgetting degree of sample information.The algorithm gives priority to learning samples with high rewards and accumulates the reward optimization model faster.A virtual target point is added and the robot’s guidance from the goal point is deprecated when the robot falls into the deadlock area judged by the environmental information collected by the radar sensors.This guides the robot to get out of a trapped area,approach a target point,and reduce unnecessary training in deadlock areas.Finally,the LSTM-PPO algorithm is simulated and verified in discrete obstacle and special obstacle scenes,and it is compared with traditional PPO and SDAS-PPO algorithms in the average reward and path length.The verification results show that the designed LSTM-PPO algorithm can reach the reward peak faster in various scenarios of training,enable faster convergence,reduce redundant road sections,improve path smoothness,and shorten path length.
作者 刘国名 李彩虹 李永迪 张国胜 张耀玉 高腾腾 LIU Guoming;LI Caihong;LI Yongdi;ZHANG Guosheng;ZHANG Yaoyu;GAO Tengteng(School of Computer Science and Technology,Shandong University of Technology,Zibo 255000,Shandong,China)
出处 《计算机工程》 CAS CSCD 北大核心 2023年第2期119-126,135,共9页 Computer Engineering
基金 国家自然科学基金面上项目(61473179,61973184)。
关键词 机器人 局部路径规划 长短期记忆神经网络 近端策略优化算法 虚拟目标点 robot local path planning Long Short-Term Memory(LSTM)neural network Proximal Policy Optimization(PPO)algorithm virtual target point
  • 相关文献

参考文献10

二级参考文献89

共引文献759

同被引文献52

引证文献4

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部