期刊文献+

强化学习原理、算法及应用 被引量:19

Reinforcement Learning Theory,Algorithms and Application
下载PDF
导出
摘要 强化学习(ReinforcementLearningRL)是从动物学习理论发展而来的,它不需要有先验知识,通过不断与环境交互来获得知识,自主的进行动作选择,具有自主学习能力,在自主机器人行为学习中受到广泛重视.本文综述了强化学习的基本原理,各种算法,包括TD算法、Q-学习和R学习等,最后介绍了强化学习的应用及其在多机器人系统中的研究热点问题. Reinforcement Learning develops from the animal learning theory. RL does not need prior knowledge, and it can autonomously improve its behavior policy with the knowledge obtained by continuously interacting with the environment. The main reinforcement learning algorithm including TD algorithm, Q-learning and R-learning are roundly introduced. Finally, the research and development on the multiple mobile robots system are presented.
出处 《河北工业大学学报》 CAS 2006年第6期34-38,共5页 Journal of Hebei University of Technology
关键词 强化学习 TD算法 Q-学习 R-学习 reinforcement Learning TD algorithm Q-learning R-learning
  • 相关文献

参考文献24

  • 1Singh S.Agents and reinforcement learning[M].San Matco,CA,USA:Miller freeman publish Inc,1997.
  • 2张汝波,顾国昌,刘照德,王醒策.强化学习理论、算法及应用[J].控制理论与应用,2000,17(5):637-642. 被引量:92
  • 3Sutton S,Barto G.Reinforcement Learning:An Introduction[M].Cambridge,MA,USA:MIT Press,1998.
  • 4Kaelbling P,Littman L.Reinforcement Learning:A Survey[J].Journal of Artificial Intelligence Research,1996 (4):237-285.
  • 5Sutton S.Learning to predict by the methods of temporal difference[J].Machine Learning,1988 (3):9-44.
  • 6高阳,陈世福,陆鑫.强化学习研究综述[J].自动化学报,2004,30(1):86-100. 被引量:269
  • 7WatkinsC.Q-Learning[J].Machine Learning,1992,8 (3):279-292.
  • 8Schwartz A.A Reinforcement Learning method for maximizing undiscounted rewards[A].Proceedings of the Tenth International Conference on Machine Learning[C].Amherst,MA:Morgan Kaufmann,1993.298-305.
  • 9Singh S.Reinforcement Learning Algorithms for Average-payoffMarkov Decision Processes[A].In Proceedings of the 12th AAAI[C].Seattle,Washington:AAAI Menlo Park,1994.700-705.
  • 10Tadepalli C,Ok D.Model-based average reward reinforcement learning[J].Artificial Intelligence,1998,100 (1):177-224.

二级参考文献68

  • 1杨璐,洪家荣,黄梯云.用加强学习方法解决基于神经网络的时序实时建模问题[J].哈尔滨工业大学学报,1996,28(4):136-139. 被引量:2
  • 2阎平凡.再励学习——原理、算法及其在智能控制中的应用[J].信息与控制,1996,25(1):28-34. 被引量:30
  • 3SUTTON R. Learning to predict by the methods of temporal difference [J]. Machine Learning, 1988,3( 1 ) :9 - 44.
  • 4RIBEIRO C. Embedding a priori knowledge in reinforcement learning [ J]. J of Intelligent and Robotic Systems, 1998,21 ( 1 ) :51 - 71.
  • 5OH C, NAKASHIMA T, ISHIBUCHI H. Initialization of Q -values by fuzzy rules for accelerating Q -learning [A]. Proc of IEEE Int Conf on Neural Networks [ C ]. Piscataway, NJ: IEEE Press,1998:2051 - 2056.
  • 6ISHIBUCHI H, NAKASHIMA T, MIYAMOTO H. Fuzzy Q-learning for a multi-player non-cooperative repeated game[ A]. Proc of IEEE Int Conf on Fuzzy Systems [ C]. Piscataway,NJ: IEEE Press, 1997:1573 - 1579.
  • 7SUN R, PETERSON T. Multiagent reinforcement learning: weighting and partitioning [J]. Neural Networks, 1999, 12(4) :727 - 753.
  • 8TAKAHASHI Y, ASADA M, HOSODA K. Reasonable performance in less learning time by real robot based on incremental state space segmentation [ A ]. Proc of IEEE/ RSJ Int Conf on Intelligent Robots and Systems [C]. Piscataway, NJ:IEEE Press, 1996:1518.
  • 9HOUGEN D F, GINI M, SLAGLE J. Partitioning input space for reinforcement learning for control [ A ]. Proc of IEEE Int Conf on Neural Networks [C]. Piscataway, NJ: IEEE Press, 1997:755-760.
  • 10FINTON D J, HU Y. An application of importance-based feature extraction in reinforcement learning [ A ]. Proc of the 4th IEEE Workshop on Neural Networks for Signal Processing [ C]. Piscataway,NJ:IEEE Press, 1994:52 - 60.

共引文献417

同被引文献138

引证文献19

二级引证文献71

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部