摘要
Q-learning是一种经典的增强学习算法,简单易用且不需要环境模型;广泛应用于移动机器人路径规划。但在状态空间和动作空间较大时,经典的Q-learning算法存在学习效率低、收敛速度慢,容易陷入局部最优解等问题。通过引入神经网络模型,利用地图信息计算状态势值,从而优化了设计奖励函数。合理奖励函数为Q(λ)-learning算法提供了先验知识,避免训练中的盲目搜索,同时奖励函数激励避免了陷入局部最优解。仿真试验表明,改进的路径规划方法在收敛速度方面有很大的提升,训练得到的路径为全局最优。
Q-learning is a classical reinforcement learning algorithm,which is simple to use and does not need environment model. It is widely used in mobile robot path planning. However,when the state space and action space are large,the classical Q-learning algorithm has the problems of low learning efficiency,slow convergence speed and easy to fall into local optimal solution. By introducing the neural network model and using map information to calculate the state potential value,the design reward function is optimized. Reasonable reward function provides prior knowledge for Q(λ)-learning algorithm,avoiding blind search in training,and reward function incentive avoids falling into local optimal solution. The simulation results show that the improved path planning method improves the convergence speed greatly,and the trained path is globally optimal.
作者
王健
张平陆
赵忠英
程晓鹏
WANG Jian;ZHANG Ping-lu;ZHAO Zhong-ying;CHENG Xiao-peng(Special Robot BG,Shenyang SIASUN Robot & Automation Co.,Ltd.,Shenyang 110169,China;Department of Mechanical and Traffic Engineering,Shenyang Institute of Science and Technology,Shenyang 110167,China)
出处
《自动化与仪表》
2019年第9期1-4,共4页
Automation & Instrumentation
关键词
路径规划
神经网络
强化学习
移动机器人
奖励函数
path planning
neural network
reinforcement learning
mobile robot
reward function