摘要
针对传统Q学习存在收敛速度慢、学习效率低等问题,提出一种基于阶段Q学习的机器人路径规划算法。首先基于环境规模设置每阶段探索步长,减少搜索重复度;其次设置奖励池和奖励阈值确保每一阶段为最优探索;最后组合阶段最优路径为全局最优路径。通过仿真实验表明,与传统Q学习算法相比,阶段Q学习算法提高了学习效率,提升算法的收敛速度,使得机器人在复杂环境中能够迅速找到无碰撞路径。
Aiming at the problems of slow convergence speed and low learning efficiency in traditional Q learning,a robot path planning algorithm based on stage Q learning was proposed.We set the exploration step size of each stage based on the scale of the environment to reduce the search repetition;and we set the reward pool and reward threshold to ensure that each stage is the optimal exploration;The optimal path of the combined stage is the global optimal path.Simulation experiments show that compared with the traditional Q learning algorithm,the stage Q learning algorithm improves the learning efficiency and the convergence speed of the algorithm,so that the robot can quickly find a collision-free path in a complex environment.
作者
杨秀霞
高恒杰
刘伟
张毅
YANG Xiuxia;GAO Hengjie;LIU Wei;ZHANG Yi(Coast Guard College, Naval Aviation University, Yantai 264001, China;School of Combat Service, Naval Aviation University, Yantai 264001, China)
出处
《兵器装备工程学报》
CSCD
北大核心
2022年第5期197-203,共7页
Journal of Ordnance Equipment Engineering
基金
山东省自然科学基金项目(ZR2020MF090)。
关键词
强化学习
机器人
路径规划
Q学习
分阶段最优探索
reinforcement learning
robot
path planning
Q learning
optimal exploration in stages