Solution to reinforcement learning problems with artificial potential field 被引量：3

Solution to reinforcement learning problems with artificial potential field

下载PDF

导出

摘要 A novel method was designed to solve reinforcement learning problems with artificial potential field.Firstly a reinforcement learning problem was transferred to a path planning problem by using artificial potential field(APF),which was a very appropriate method to model a reinforcement learning problem.Secondly,a new APF algorithm was proposed to overcome the local minimum problem in the potential field methods with a virtual water-flow concept.The performance of this new method was tested by a gridworld problem named as key and door maze.The experimental results show that within 45 trials,good and deterministic policies are found in almost all simulations.In comparison with WIERING's HQ-learning system which needs 20 000 trials for stable solution,the proposed new method can obtain optimal and stable policy far more quickly than HQ-learning.Therefore,the new method is simple and effective to give an optimal solution to the reinforcement learning problem. A novel method was designed to solve reinforcement learning problems with artificial potential field. Firstly a reinforcement learning problem was transferred to a path planning problem by using artificial potential field（APF）, which was a very appropriate method to model a reinforcement learning problem. Secondly, a new APF algorithm was proposed to overcome the local minimum problem in the potential field methods with a virtual water-flow concept. The performance of this new method was tested by a gridworld problem named as key and door maze. The experimental results show that within 45 trials, good and deterministic policies are found in almost all simulations. In comparison with WIERING＇s HQ-learning system which needs 20 000 trials for stable solution, the proposed new method can obtain optimal and stable policy far more quickly than HQ-learning. Therefore, the new method is simple and effective to give an optimal solution to the reinforcement learning problem.

作者谢丽娟谢光荣陈焕文李小俚

机构地区 Institute of Mental Health School of Computer and Communication School of Computer and Communication Department of Computer Engineering School of Computer Science

出处《Journal of Central South University of Technology》 EI 2008年第4期552-557,共6页 中南工业大学学报（英文版）

基金 Projects(30270496,60075019,60575012)supported by the National Natural Science Foundation of China

关键词强化学习计划导航电位 reinforcement learning path planning mobile robot navigation artificial potential field virtual water-flow

分类号 TP18 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

参考文献14

1邹小兵,蔡自兴,孙国荣.Non-smooth environment modeling and global path planning for mobile robots[J].Journal of Central South University of Technology,2003,10(3):248-254. 被引量：6
2祝晓才,董国华,蔡自兴,胡德文.Robust simultaneous tracking and stabilization of wheeled mobile robots not satisfying nonholonomic constraint[J].Journal of Central South University of Technology,2007,14(4):537-545. 被引量：5
3文志强,蔡自兴.Global path planning approach based on ant colony optimization algorithm[J].Journal of Central South University of Technology,2006,13(6):707-712. 被引量：5
4Andrew G. Barto,Sridhar Mahadevan.Recent Advances in Hierarchical Reinforcement Learning[J].Discrete Event Dynamic Systems (-).2003(1-2)
5KAELBLING L P,,LITTMAN M L,MOORE A W.Reinforcement learning:A survey[].Journal of Artificial Organs.1996
6SUTTON R S,BARTO A.Reinforcement learning:An introduction[]..1998
7BANERJEE B,STONE P.General game learning using knowledge transfer[].Proceedings of theth International Joint Conference on Artificial Intelligence.2007
8ASADI M,HUBER M.Effective control knowledge transfer through learning skill and representation hierarchies[].Proceedings of theth International Joint Conference on Artificial Intelligence.2007
9KONIDARIS G,BARTO A.Autonomous shaping:Knowledge transfer in reinforcement learning[].Proceedings of therd International Conference on Machine Learning.2006
10MEHTA N,NATARAJAN S,TADEPALLI P,FERN A.Transfer in variable-reward hierarchical reinforcement learning[].Workshop on Transfer Learning at Neural Information Processing Systems.2005

二级参考文献14

1金飞虎,洪炳熔,高庆吉.基于蚁群算法的自由飞行空间机器人路径规划[J].机器人,2002,24(6):526-529. 被引量：52
2朱庆保,张玉兰.基于栅格法的机器人路径规划蚁群算法[J].机器人,2005,27(2):132-136. 被引量：120
3朱庆保.动态复杂环境下的机器人路径规划蚂蚁预测算法[J].计算机学报,2005,28(11):1898-1906. 被引量：50
4BROCKETT R W.Asymptotic stability and feedback stabilization[].Differential Geometric Control Theory.1983
5LUCA A D,ORIOLO G,VENDITTELLI M.Control of wheeled mobile robots: An experimental overview[].RAMSETE—Articulated and Mobile Robotics for Services and Technologies.2001
6ORIOLO G,LUCA A D,VENDITTELLI M.WMR control via dynamic feedback linearization: Design,implementation and experimental validation[].IEEE Transactions on Control Systems Technology.2002
7DO K D,JIANG Z P,PAN J.A global output-feedback controller for simultaneous tracking and stabilization of unicycle-type mobile robots[].IEEE Transactions on Robotics.2004
8de CANUDAS WIT C,KHENNOUF H.Quasi-continuous stabilizing controllers for nonholonomic systems: Design and robustness considerations[].Proc rd Euro Contr Conf Rome.1995
9D’ANDREA-NOVEL B,CAMPION G,BASTIN G.Control of wheeled mobile robots not satisfying ideal velocity constraints: A singular perturbation approach[].International Journal of Robust and Nonlinear Control.1995
10LEROQUAIS W,D’ANDREA-NOVEL B.Modeling and control of wheeled mobile robots not satisfying ideal velocity constraints: the unicycle case[].Proc th Conf Decision Control.1996

共引文献13

1江军强.基于遗传算法的信息技术类课程自动组卷应用研究[J].大庆师范学院学报,2013,33(3):152-156. 被引量：2
2CAI Qiang,LI Hai-sheng,YANG Qin,LI Ji-gang.A novel method for robot path planning[J].重庆邮电大学学报（自然科学版）,2009,21(2):173-177.
3吴永宏,李群明.机械手夹持接触力及力封闭分析[J].中南大学学报（自然科学版）,2009,40(6):1580-1586. 被引量：13
4邹北骥,孟志刚,向遥,曾羽.实时双向群组搜索及其在蚁群觅食动画中的应用[J].小型微型计算机系统,2011,32(6):1127-1132.
5张彤,肖南峰.Plane extraction for navigation of humanoid robot[J].Journal of Central South University,2011,18(3):627-632.
6狄圣杰,徐卫亚,宁宇,王伟,吴关叶.Macro-mechanical properties of columnar jointed basaltic rock masses[J].Journal of Central South University,2011,18(6):2143-2149. 被引量：9
7唐启涛.基于改进的遗传算法的智能组卷算法研究[J].计算机技术与发展,2014,24(12):241-244. 被引量：10
8叶锦华,吴海彬.具有未知侧滑和打滑的WMR强化学习自适应神经网络控制[J].福州大学学报（自然科学版）,2016,44(2):219-224.
9尚梦雨.无人机实时蚁群算法路径规划[J].自动化应用,2016(12):61-63. 被引量：5
10叶锦华,吴海彬.不确定轮式移动机器人统一自适应神经网络H_∞控制[J].中国机械工程,2017,28(2):150-155. 被引量：2

同被引文献30

1乔俊飞,侯占军,阮晓钢.基于神经网络的强化学习在避障中的应用[J].清华大学学报（自然科学版）,2008,48(S2):1747-1750. 被引量：27
2王芳,万磊,徐玉如,张玉奎.基于改进人工势场的水下机器人路径规划[J].华中科技大学学报（自然科学版）,2011,39(S2):184-187. 被引量：15
3李伟,何雪松,叶庆泰,朱昌明.基于先验知识的强化学习系统[J].上海交通大学学报,2004,38(8):1362-1365. 被引量：4
4Szczerba R J , Galkowski P,Glickstein I S, et al. Ro-bust algorithm for real-time route planning [J]. IEEETransactions on Aerospace and Electronic System, 2000,36(3): 869-878.
5Watkins P D. Q-learning [J]. Machine Learning,1992,8(3) : 279-292.
6Ng A Y. Shaping and policy search in reinforcementlearning [D]. Berkeley: University of California, 2003.
7DUDEK G, JENKIN M. Computational principles of mobile robotics [M]. Cambridge University Press, 2010: 80-105.
8KIM Y J, KIM J H, KWON D S. Evolutionary programming-based univector field navigation method for fast mobile robots [J]. IEEE Trans on Systems, Man, and Cybernetics B, 2001, 31(3): 450-458.
9PARK K H, KJM Y J, KIM J H. Modular Q-Iearning based multi-agent cooperation for robot soccer [J]. Robotics and Autonomous Systems, 2001, 35(2): 109-122.
10WATKJNS C. Learning from delayed rewards [D]. London: King's College, 1989.