期刊文献+

一种基于HAMs体系的层次分解方法

HAMs-family Based Method for Hierarchical Decomposition
下载PDF
导出
摘要 在HAMs框架中引入策略耦合SMDPs的观点,定义了HAM-可分解概念,并明确了HAM机、HAM-可分解及策略耦合SMDPs这三者之间的关系,证明了HAM框架适合解决策略耦合SMDPs问题.在此基础上,针对一类具有有向无环图形式的策略耦合SMDPs问题,提出一种层次分解方法,并给出一个判断层次分解有效性的条件.最后使用一个典型的实验来说明该方法的特点. This paper introduces the concept of "policy-coupled" semi-Markov decision processes (SMDPs) into HAMs. It defines the concept of HAM-decomposable and makes the relations among the HAM machine, HAM-decomposable, and "policy-coupled" SMDPs clear. It also proves that HAMs is suitable for solving the "policy-coupled" SMDPs problem. Based on these, this paper gives a method for hierarchical decomposition on a class of "policy-coupled" SMDPs with a DAG call graph and presents a precondition that can be used for determining whether or not can generate a valid hierarchical decomposition. Lastly, a typical experiment is tested for illustrating the characteristics of this method.
出处 《小型微型计算机系统》 CSCD 北大核心 2008年第4期653-658,共6页 Journal of Chinese Computer Systems
基金 国家自然科学基金面上项目(60503048)资助
关键词 层次强化学习 层次抽象机 策略耦合SMDPs hierarchical reinforcement learning hierarchies of abstract machines policy-coupled semi-Markov decision processes
  • 相关文献

参考文献5

二级参考文献48

  • 1沈晶,顾国昌,刘海波.分层强化学习中的Option自动生成算法[J].计算机工程与应用,2005,41(34):4-6. 被引量:5
  • 2[2]A G Barto,S Mahadevan.Recent Advances in Hierarchical Reinforcement Learning[J].Discrete Event Dynamic Systems:Theory and Applications,2003; 13 (4):41~77
  • 3[3]R S Sutton,D Precup,S P Singh.Between MDPs and Semi-MDPs:A Framework for Temporal Abstraction in Reinforcement Learning[J].Artificial Intelligence,1999; 112 (1-2):181~211
  • 4[4]R Parr.Hierarchical Control and Learning for Markov Decision Processes[D].Ph D Thesis.University of California,Berkeley,1998
  • 5[5]T G Dietterich.Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition[J].Journal of Artificial Intelligence Research,2000; 13:227~303
  • 6[6]B L Digney.Learning Hierarchical Control Structures for Multiple Tasks and Changing Environments[C].In:Proceedings of the Fifth International Conference on Simulation of Adaptive Behavior,Zurich,Switzerland,1998:321~330
  • 7[7]A McGovern ,A Barto.Autonomous Discovery of Subgoals in Reinforcement Learning Using Deverse Density[C].In :Proceedings of the Fifth International Conference on Machine Learning,San Fransisco:Morgan Kaufmann,2001:361~368
  • 8[8]I Menache,S Mannor,N Shimkin.Q-Cut:Dynamic discovery of subgoals in reinforcement learning.Lecture Notes in Computer Science,Springer,Vol 2430,2002:295~306
  • 9[9]S Mannor et al.Dynamic Abstraction in Reinforcement Learning via Clustering[C].In :Proceedings of the Twenty-First International Conference on Machine Learning,Banff,Canada,2004:560~567
  • 10[10]D Precup.Temporal Abstraction in Reinforcement Learning[D].Ph D Dissertation.University of Massachusetts,Amherst,2000

共引文献18

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部