期刊文献+

动态环境中的分层强化学习 被引量:5

Hierarchical reinforcement learning in dynamic environment
下载PDF
导出
摘要 现有的强化学习方法都不能很好地处理动态环境中的学习问题,当环境变化时需要重新学习最优策略,若环境变化的时间间隔小于策略收敛时间,学习算法则不能收敛.本文在Option分层强化学习方法的基础上提出一种适应动态环境的分层强化学习方法,该方法利用学习的分层特性,仅关注分层任务子目标状态及当前Option内部环境状态的变化,将策略更新过程限制在规模较小的局部空间或维数较低的高层空间上,从而加快学习速度.以二维动态栅格空间内两点间最短路径规划为背景进行了仿真实验,实验结果表明,该方法策略学习速度明显高于以往的方法,且学习算法收敛性对环境变化频率的依赖性有所降低. The existing reinforcement learning approaches cannot satisfactorily solve the learning problems in dynamic environment. The optimal strategy must be re-learned when environment changes. The learning algorithm cannot converge to optimal strategy if the interval between the changes is shorter than the duration of strategy converging. In this paper, a hierarchical reinforcement learning approach adapting to dynamic environments is presented based on the Option hierarchical reinforcement learning. According to the hierarchical characteristic of learning, the approach only takes into account the changes taking place in the sub-goal states of hierarchical tasks or the environment states of current Option. So the process of strategy update is limited in a small-scale local space or a low dimension high-level space. Consequently, the process of strategy update is accelerated. The experiments with shortest path planning in a two-dimensional dynamic grid space show that the presented approach is obviously faster than the existing approach in strategy update. Additionally the dependency of convergence of the learning algorithm on the frequency of environment change is reduced.
出处 《控制理论与应用》 EI CAS CSCD 北大核心 2008年第1期71-74,共4页 Control Theory & Applications
基金 中国博士后基金资助项目(20060400809) 哈尔滨工程大学基础研究基金资助项目(HEUFT07022 HEUFT05068 HEUFT05021)
关键词 分层强化学习 动态环境 OPTION 策略更新 hierarchical reinforcement learning dynamic environment Option strategy update
  • 相关文献

参考文献11

  • 1高阳,陈世福,陆鑫.强化学习研究综述[J].自动化学报,2004,30(1):86-100. 被引量:263
  • 2EXCELENTE-TOLEDO C B, JENNINGS N R. Using reinforcement learning to coordinate better[J]. Computational Intelligence, 2005, 21(3): 217 - 245
  • 3BARTO A G, MAHADEVAN S. Recent advances in hierarchical reinforcement learning[J]. Discrete Event Dynamic Systems: Theory and Applications, 2003, 13(4): 41 - 77.
  • 4SUTTON R S, PRECUP D, SINGH S P. Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning[J]. Artificial Intelligence, 1999, 112(1): 181 - 211.
  • 5PARRR. Hierarchical control and learning for markov decision processes[D]. Berkeley: University of California, 1998.
  • 6DIETTERICH T G. Hierarchical reinforcement learning with the MAXQ value function decomposition[J]. J of Artificial Intelligence Research, 2000, 13(1): 227 - 303.
  • 7PRECUP D. Temporal abstraction in reinforcement learning[D]. Amherst: University of Massachusetts, 2000.
  • 8DIGNEY B L. Learning hierarchical control structures for multiple tasks and changing environments[C]//From Animals to Animats 5: Proc of the Fifth Int Conference on Simulation of Adaptive Behavior. Cambridge: MIT Press, 1998:321 - 330.
  • 9MCGOVERN A, BARTO A. Autonomous discovery of subgoals in reinforcement learning using diverse density[C]// Proceedings of the 8th Int Conf on Machine Learning. San Fransisco: Morgan Kaufmann, 2001:361 - 368.
  • 10MENACHE I, MANNOR S, SHIMKIN N. Q-cut: dynamic discovery of sub-goals in reinforcement learning[C]// Proc of the 13th European Confon Machine Learning. New York: ACM Press, 2002:295 - 306.

二级参考文献4

共引文献262

同被引文献109

  • 1Laura RAY.Hierarchical state-abstracted and socially augmented Q-Learning for reducing complexity in agent-based learning[J].控制理论与应用(英文版),2011,9(3):440-450. 被引量:2
  • 2陈春林,陈宗海,周光明.基于多智能体的自主移动机器人混合式体系结构[J].系统工程与电子技术,2004,26(11):1746-1748. 被引量:9
  • 3于红斌,李孝安.基于栅格法的机器人快速路径规划[J].微电子学与计算机,2005,22(6):98-100. 被引量:63
  • 4沈永欢 梁在中.实用数学手册[M].北京:科学出版社,2004..
  • 5Kaelbling L P,Littman M L,Moore A W.Reinforcement Learning:A Survey[J].Journal of Artificial Intelligence Research,1996,4(02):237-285.
  • 6Sette S,Wyns B,Boullart L.Comparing Learning Classifier Systems and Genetic Programming A Case Study[J].Engineering Application of Artificial Intelligence,2004(17):199-204.
  • 7Holland J H. A mathematical frame work for studying learning in classifier systems [ M]. Mstardom: North - Holland, 1986 : 307 - 317.
  • 8Dorigo M, Colombetti M. Robot shaping: an experiment in behavior engineering cambridge[M]. Boston: The MIT Press, 1998.
  • 9MURRAY R M,ASTROM K M,BODY S P,et al.Future directions in control in an information-rich world[J].IEEE Control Systems Magazine,2003,23 (2):20-23.
  • 10WIERING M,OTTERLO M V.Reinforcement learning state-of-the-art[M].Berlin:Springer-Verlag,2012:3-42.

引证文献5

二级引证文献17

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部