摘要
针对移动机器人采用强化学习方法进行路径规划时存在的学习效率低及收敛速度慢等问题,提出一种改进的Q-learning算法。首先提出动态动作集策略,根据机器人当前点与终点的位置来选择其动作集;然后在算法中加入启发式奖惩函数,使得机器人采取不同的动作收获不同的奖励。由此来改进算法,进而提高算法的学习效率,加快算法收敛。最后在栅格环境下进行仿真实验,结果表明本文改进算法较传统的Q-learning算法,很大程度上加快了算法的收敛速度。
Aiming at the problems of low learning efficiency and slow convergence speed in path planning of mobile robot us⁃ing Reinforcement Learning method,an improved Q-learning algorithm is proposed.Firstly,a dynamic action set strategy is pro⁃posed,which selects the action set according to the position of the robot's current point and end point;Then the heuristic reward and punishment function is added to the algorithm to make the robot take different actions and gain different rewards.Therefore,we can improve the algorithm,improve the learning efficiency of the algorithm and speed up the convergence of the algorithm.Finally,the simulation experiment is carried out in the grid environment.The results show that the improved algorithm greatly speeds up the convergence speed of the algorithm compared with the traditional Q-learning algorithm.
作者
潘国倩
周新志
Pan Guoqian;Zhou Xinzhi(College of Electronics and Information Engineering,Sichuan University,Chengdu 610065)
出处
《现代计算机》
2022年第10期57-61,共5页
Modern Computer
基金
中国民用航空局民航联合研究基金(U1933123)。