期刊文献+

结合神经网络和Q(λ)-learning的路径规划方法 被引量:4

Path Planning Method Based on Neural Network and Q(λ)-learning
下载PDF
导出
摘要 Q-learning是一种经典的增强学习算法,简单易用且不需要环境模型;广泛应用于移动机器人路径规划。但在状态空间和动作空间较大时,经典的Q-learning算法存在学习效率低、收敛速度慢,容易陷入局部最优解等问题。通过引入神经网络模型,利用地图信息计算状态势值,从而优化了设计奖励函数。合理奖励函数为Q(λ)-learning算法提供了先验知识,避免训练中的盲目搜索,同时奖励函数激励避免了陷入局部最优解。仿真试验表明,改进的路径规划方法在收敛速度方面有很大的提升,训练得到的路径为全局最优。 Q-learning is a classical reinforcement learning algorithm,which is simple to use and does not need environment model. It is widely used in mobile robot path planning. However,when the state space and action space are large,the classical Q-learning algorithm has the problems of low learning efficiency,slow convergence speed and easy to fall into local optimal solution. By introducing the neural network model and using map information to calculate the state potential value,the design reward function is optimized. Reasonable reward function provides prior knowledge for Q(λ)-learning algorithm,avoiding blind search in training,and reward function incentive avoids falling into local optimal solution. The simulation results show that the improved path planning method improves the convergence speed greatly,and the trained path is globally optimal.
作者 王健 张平陆 赵忠英 程晓鹏 WANG Jian;ZHANG Ping-lu;ZHAO Zhong-ying;CHENG Xiao-peng(Special Robot BG,Shenyang SIASUN Robot & Automation Co.,Ltd.,Shenyang 110169,China;Department of Mechanical and Traffic Engineering,Shenyang Institute of Science and Technology,Shenyang 110167,China)
出处 《自动化与仪表》 2019年第9期1-4,共4页 Automation & Instrumentation
关键词 路径规划 神经网络 强化学习 移动机器人 奖励函数 path planning neural network reinforcement learning mobile robot reward function
  • 相关文献

参考文献2

二级参考文献14

  • 1樊长虹,陈卫东,席裕庚.动态未知环境下一种Hopfield神经网络路径规划方法[J].控制理论与应用,2004,21(3):345-350. 被引量:16
  • 2陈华华,杜歆,顾伟康.基于神经网络和遗传算法的机器人动态避障路径规划[J].传感技术学报,2004,17(4):551-555. 被引量:23
  • 3陈得宝,赵春霞.WRBF网络的设计及其在混沌时间序列中的应用[J].系统仿真学报,2005,17(3):574-576. 被引量:2
  • 4朱庆保.复杂环境下的机器人路径规划蚂蚁算法[J].自动化学报,2006,32(4):586-593. 被引量:46
  • 5O Khatib. Real-time obstacle avoidance for manipulators and mobile robots [J]. International Journal of Robotics Research (S0278-3649), 1986, 5(1): 90-98.
  • 6Brooks R, Lozano Perez T. A Subdivision Algorithm in Configuration Space for Findpath with Rotation [C].Proceedings of 8th. International Joint Conf on Articial Intelligence (ICAD. USA, IEEE Press, 1983: 799-806.
  • 7Yang S X, Max M. Neural network approaches to dynamic collision-free trajectory generation [J]. IEEE Transaction on Systems, Man, and Cybernetics, Part B (S1083-4419), 2001, 31(3): 302-318.
  • 8Jianping Tu, Yang S X. Genetic algorithm based path planning for a mobile robot [C].IEEE International Conference on Robotics and Automation. 2003, 1: 1221-1226.
  • 9Zavlangas P G, Tzafestas S G. Industrial robot navigation and obstacle avoidance employing fuzzy logic [J]. Journal of Intelligent and Robotic Systems (S0921-0296), 2000, 27(1/2): 85-97.
  • 10Y Sift, R Eberhart. Parameter selection in particle swarm optimization [C].Proceedings of 7th Annual Conference on Evolution Computation. USA, Springer, 1998: 591-601.

共引文献35

同被引文献40

引证文献4

二级引证文献20

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部