期刊文献+

双马尔可夫决策过程联合模型

Associated Model of Bi-Markov Decision Processes
下载PDF
导出
摘要 人类在处理问题中往往分为两个层次,首先在整体上把握问题,即提出大体方案,然后再具体实施。也就是说人类就是具有多分辨率智能系统的极好例子,他能够在多个层次上从底向上泛化(即看问题角度粒度变"粗",它类似于抽象),并且又能从顶向下进行实例化(即看问题角度变"细",它类似于具体化)。由此构造了由在双层(理想空间即泛化和实际空间即实例化)上各自运行的马尔可夫决策过程组成的半马尔可夫决策过程,称之为双马尔可夫决策过程联合模型。然后讨论该联合模型的最优策略算法,最后给出一个实例说明双马尔可夫决策联合模型能够经济地节约"思想",是运算有效性和可行性的一个很好的折中。 Human thought is often divided two levels while dealing with problems. First people always treat problems from a whole perspective, i. e. , they have a general plan, then they specifically deal with details. The human itself is a good example for having a multi-resolutional characteristic. It can not only generalize bottom-up among multi-levels (the granule of viewpoint about problem becomes "rough", analogous to abstract), but also instantiate top-down (the granule of viewpoint becomes "thin",analogous to specification). So we constructed a semi-Markov decision process consisting of two Markov decision processes running respectively on two levels--the ideal space (generalization) and the actual space (instantiation). It is called an associated bi-Markov decision model Then we discussed how to find optimal policy under this associated model. Finally an example was given to show that the associated bi-Markov decision process model can economically economize "mind" and is a good tradeoff between the computational validity and computational feasibility.
出处 《计算机科学》 CSCD 北大核心 2009年第9期161-166,共6页 Computer Science
基金 国家自然科学基金(90412014 60803061) 江苏省自然科学基金(BK2008293)资助
关键词 马尔可夫决策过程 增强学习 最优策略 Markov decision processes, Reinforcement learning, Optimal policy
  • 相关文献

参考文献12

  • 1Sutton R S,Precup D,Singh S.Between MDPs and semi-MDPs:a framework for temporal abstraction in reinforcement learning[J].Artificial Intelligence,1999,112:181-211.
  • 2Kersting K,Raedt L D.Logical markov decision programs[C]//Working Notes of the IJCAI-2003 Workshop on Learning Statistical Models from Relational Data (SRL-03).Acapulco,Mexico,2003:63-70.
  • 3Kersting K,Raedt L D.Logical Markov Decision Programs and the Convergence of Logic TD (λ)[C]//Proceeding of The 14th International Conference of Inductive Logic Programming.Porto,Portugal,2004:180-197.
  • 4Otterlo M V.Reinforcement Learning for Relational MDPs[C]//Proceedings of the Annual Machine Learning Conference of Belgium and the Netherlands.Brussels,Belgium,2004:138-145.
  • 5Wiering M,SchmidhuberJ.HQ-Learning:Discovering Markovian Subgoals for Non-Markovian Reinforcement learning[R].IDSIA-95-96.
  • 6Whitehead S D,Lin L-J.Reinforcement learning of non-Markov decision processes[J].Artificial Intelligence,1995,73:271-306.
  • 7Das T K,Gosavi A,Marchalleck S M N.Solving Semi-Markov Decision Problems using Average Reward Reinforcement learning[J].Management Science,1999,4(45):560-574.
  • 8Ravindran B,Barto A G.Symmetries and Model Minimization in Markov Decision Processes[R].UM-CS-2001-043.
  • 9Ravindran B,Barto A G.SMDP Homomorphisms:An Algebraic Approach to Abstraction in Semi-Markov Decision Processes[C]//Proceeding of the Eighteenth International Joint Conference on Artificial Intelligence.2003.
  • 10Kersting K,Otterlo M V,Raedt L D.Bellman goes Relational[C]//Proceedings of the 21st International Conference on Machine learning.Banff,Canada,2004.

二级参考文献10

  • 1Boutilier C, Reiter R, Price B. Symbolic Dynamic Programming for First-order MDPs//Seventeenth International Joint Conference on Artificial Intelligence(UCAI-01). Seattle: USA, 2001: 690-700
  • 2Guestrin C, Koller D, Parr R, et al. Efficient Solution Algorithms for Factored MDPs. Journal of Artificial Intelligence Research, 2003,19 : 399-468
  • 3Kersting K, Raedt L D. Logical Markov Decision Programs//Working Notes of the IJCAI-2003 Workshop on Learning Statistical Models from Relational Data (SRL-03). Acapulco, Mexico. 200S. 6S-70
  • 4Kersting K, Raedt L D. Logical Markov Decision Programs and the Convergence of Logical TD(λ)//Proceeding of The 14^th International Conference of Inductive Logic Programming. Porto, Portugal, 2204 : 180-197
  • 5van Otterlo M. Reinforcement Learning for Relational MDPs//Proceedings of the Annual Machine Learning Conference of Belgium and the Netherlands. Brussels, Belgium, 2004;138-145
  • 6Kersting K, van Otterlo M, Raedt L D. Bellman goes to Relational//Proceedings of the 21^st International Conference on Machine Learning. Banff, Canada, 2004
  • 7Fern A, Yoon S, Givan R. Approxiamte Policy Iteration with a Policy Language Bias: Solving Relational Markov Decision Processes. Journal of Artificial Intelligence Research, 2006,25 : 75-118
  • 8张汝波,顾国昌,刘照德,王醒策.强化学习理论、算法及应用[J].控制理论与应用,2000,17(5):637-642. 被引量:91
  • 9仲宇,顾国昌,张汝波.多智能体系统中的分布式强化学习研究现状[J].控制理论与应用,2003,20(3):317-322. 被引量:12
  • 10高阳,陈世福,陆鑫.强化学习研究综述[J].自动化学报,2004,30(1):86-100. 被引量:262

共引文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部