期刊文献+

基于概率模型的动态分层强化学习 被引量:2

Dynamic hierarchical reinforcement learning based on probability model
下载PDF
导出
摘要 为解决大规模强化学习中的"维度灾难"问题,克服以往学习算法的性能高度依赖于先验知识的局限性,本文提出一种基于概率模型的动态分层强化学习方法.首先基于贝叶斯学习对状态转移概率进行建模,建立基于概率参数的关键状态识别方法,进而通过聚类动态生成若干状态子空间和学习分层结构下的最优策略.仿真结果表明该算法能显著提高复杂环境下智能体的学习效率,适用于未知环境中的大规模学习. To deal with the overwhelming dimensionality in the large-scale reinforcement-learning and the strong depen-dence on prior knowledge in existing learning algorithms,we propose the method of dynamic hierarchical reinforcement learning based on the probability model(DHRL--model).This method identifies some key states automatically based on probability parameters of the state-transition probability model established based on Bayesian learning,then generates some state-subspaces dynamically by clustering,and learns the optimal policy based on hierarchical structure.Simulation results show that DHRL--model algorithm improves the learning efficiency of the agent remarkably in the complex environment,and can be applied to learning in the unknown large-scale world.
出处 《控制理论与应用》 EI CAS CSCD 北大核心 2011年第11期1595-1600,1606,共7页 Control Theory & Applications
基金 国家自然科学基金资助项目(60874042) 中国博士后科学基金一等资助项目(20080440177) 中国博士后科学基金特别资助项目(200902483) 教育部高等学校博士点基金新教师基金资助项目(20090162120068)
关键词 动态分层强化学习 贝叶斯学习 状态转移概率模型 智能体 dynamic hierarchical reinforcement-learning Bayesian learning state-transition probability model agent
  • 相关文献

参考文献16

  • 1高阳,陈世福,陆鑫.强化学习研究综述[J].自动化学报,2004,30(1):86-100. 被引量:268
  • 2KAELBLING L P, LITTMAN M L. Reinforcement learning: a sur- vey[J]. Journal ofArtificiallntelligence Research, 1996, 4(1): 237- 285.
  • 3STRENS M. A Bayesian framework for reinforcement learning[C] //Proceeedings of the 17th International Conference on Machine Learning. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 2000:943 -950.
  • 4沈晶,顾国昌,刘海波.分层强化学习中的动态分层方法研究[J].小型微型计算机系统,2007,28(2):287-291. 被引量:1
  • 5彭志平,李绍平.分层强化学习研究进展[J].计算机应用研究,2008,25(4):974-978. 被引量:7
  • 6SUTTON R S, PRECUP D, SINGH S. Between MDPs and Semi- MDPs: a framework for temporal abstraction in reinforcement learn- ing[J]. Artificial Intelligence, 1999, 112(1): 181 - 211.
  • 7PARR R E. Hierarchical control and learning for Markov decision processes[D]. Berkeley, CA: University of California, 1998.
  • 8DIETTERICH T G, Hierarchical reinforcement learning with the MAXQ value function decomposition[J]. Journal of Artificial Intelli- gence Research, 2000, 13(1): 227 - 303.
  • 9HENGST B. Discovering hierarchy in reinforcement learning with HEXQ[C]//Proceedings of the Nineteenth International Conference on Machine Learning. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc, 2002:243 - 250.
  • 10JONG N K, STONE E Hierarchical model-based reinforcement learning: R-MAX + MAXQ[C]//Proceedings of the 25th Interna- tional Conference on Machine Learning. New York: ACM, 2008: 432 - 439.

二级参考文献60

  • 1WEILI QingtaiYE ChangmingZHU.APPLICATION OF HIERARCHICAL REINFORCEMENT LEARNING IN ENGINEERING DOMAIN[J].Journal of Systems Science and Systems Engineering,2005,14(2):207-217. 被引量:3
  • 2Barto A G,Mahadevan S.Recent advances in hierarchical reinforcement learning[J].Discrete Event Dynamic Systems:Theory and Applications,2003,13(4):41-77.
  • 3Sutton R S,Precup D,Singh S P.Between MDPs and semi-MDPs:a framework for temporal abstraction in reinforcement learning[J].Artificial Intelligence,1999,112(1):181-211.
  • 4Parp R.Hierarchical control and learning for markov decision processes[D].Berkeley:University of California,1998.
  • 5Dietterich T G.Hierarchical reinforcement learning with the MAXQ value function decomposition[J].Journal of Artificial Intelligence Research,2000,13(1):227-303.
  • 6Digney B L.Learning hierarchical control structures for multiple tasks and changing environments[C].In:Proc.of the 5th International Conference on Simulation of Adaptive Behavior,Zurich,Switzerland,1998:321-330.
  • 7Mcgovern A,Barto A.Autonomous discovery of subgoals in reinforcement learning using diverse density[C].In:Proc.of the 8th International Conference on Machine Learning,San Fransisco:Morgan Kaufmann,2001:361-368.
  • 8Menache I,Mannor S,Shimkin N.Q-cut:dynamic discovery of sub-goals in reinforcement learning[C].In:Proc.the 13th European Conference on Machine Learning,Helsinki,Finland,2002:295-306.
  • 9Mannor S,et al.Dynamic abstraction in reinforcement learning via clustering[C].In:Proc.of the 21th International Conference on Machine Learning,Banff,Canada,2004:560-567.
  • 10Precup D.Temporal abstraction in reinforcement learning[D].Amherst:University of Massachusetts,2000.

共引文献273

同被引文献31

  • 1SUTTON R S, BARTO A G. Reinforcement Learning: An Introduction[M]. Cambridge Massachusetts: MIT Press, 1998.
  • 2LIU C, XU X, HU D. Multiobjective reinforcement learning: a com-prehensive overview[J]. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews, 2013, 99(4): 1 - 13.
  • 3WIERING M, OTTERLO M V. Reinforcement Learning State of the Art[M]. Berlin: Springer-Verlag, 2012, 10(3): 325 - 331.
  • 4LUCIAN B, ROBERT B, BART D S. Reinforcement Learning and Dynamic Programming Using Function Approximators[M]. New York: CRC Press, 2010.
  • 5DRIES S VAN DEN, WIERING M A. Neural-fitted TD-Ieaflearning for playing othello with structured neural networks[J]. IEEE Transactions on Neural Networks and Learning Systems, 2012, 23(11): 1701 - 1713.
  • 6SUTTON R S. Learning to predict by the methods of temporal differences[J]. Machine Learning, 1988,3(1): 9 - 44.
  • 7DAYAN P, SEJNOWSKI T J. TD().) converges with probability 1[J]. Machine Learning, 1994, 14(1): 295 - 301.
  • 8MIROLLI M, SANTUCCI V G, BALDASSARRE G. Phasic dopamine as a prediction error of intrinsic and extrinsic reinforcements driving both action acquisition and reward maximization: a simulated robotic study[J]. Neural Networks, 2013, 39(3): 40 - 51.
  • 9BHASIN S, KAMALAPURKAR R, JOHNSON M, et al. A novel actor-critic-identifier architecture for approximate optimal control of uncertain nonlinear systems[J]. Automatica, 2013, 49(1): 82 - 92.
  • 10BAIRD L C. Residual algorithms: reinforcement learning with function approximation[C] //Proceedings of the 12th International Conference on Machine Learning. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc, 1995: 30 - 37.

引证文献2

二级引证文献6

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部