一种基于HAMs体系的层次分解方法

HAMs-family Based Method for Hierarchical Decomposition

下载PDF

导出

摘要在HAMs框架中引入策略耦合SMDPs的观点,定义了HAM-可分解概念,并明确了HAM机、HAM-可分解及策略耦合SMDPs这三者之间的关系,证明了HAM框架适合解决策略耦合SMDPs问题.在此基础上,针对一类具有有向无环图形式的策略耦合SMDPs问题,提出一种层次分解方法,并给出一个判断层次分解有效性的条件.最后使用一个典型的实验来说明该方法的特点. This paper introduces the concept of ＂policy-coupled＂ semi-Markov decision processes （SMDPs） into HAMs. It defines the concept of HAM-decomposable and makes the relations among the HAM machine, HAM-decomposable, and ＂policy-coupled＂ SMDPs clear. It also proves that HAMs is suitable for solving the ＂policy-coupled＂ SMDPs problem. Based on these, this paper gives a method for hierarchical decomposition on a class of ＂policy-coupled＂ SMDPs with a DAG call graph and presents a precondition that can be used for determining whether or not can generate a valid hierarchical decomposition. Lastly, a typical experiment is tested for illustrating the characteristics of this method.

作者杜小勤李庆华韩建军

机构地区华中科技大学计算机科学与技术学院

出处《小型微型计算机系统》 CSCD 北大核心 2008年第4期653-658,共6页 Journal of Chinese Computer Systems

基金国家自然科学基金面上项目(60503048)资助

关键词层次强化学习层次抽象机策略耦合SMDPs hierarchical reinforcement learning hierarchies of abstract machines policy-coupled semi-Markov decision processes

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献5

1沈晶,顾国昌,刘海波.一种新的分层强化学习方法[J].计算机应用,2006,26(8):1938-1939. 被引量：1
2沈晶,顾国昌,刘海波.基于多智能体的Option自动生成算法[J].智能系统学报,2006,1(1):84-87. 被引量：2
3沈晶,顾国昌,刘海波.分层强化学习中的Option自动生成算法[J].计算机工程与应用,2005,41(34):4-6. 被引量：5
4苏畅,高阳,陈世福,陈兆乾.基于SMDP环境的自主生成options算法的研究[J].模式识别与人工智能,2005,18(6):679-684. 被引量：9
5王本年,高阳,陈兆乾,谢俊元,陈世福.面向Option的k-聚类Subgoal发现算法[J].计算机研究与发展,2006,43(5):851-855. 被引量：8

二级参考文献48

1沈晶,顾国昌,刘海波.分层强化学习中的Option自动生成算法[J].计算机工程与应用,2005,41(34):4-6. 被引量：5
2[2]A G Barto,S Mahadevan.Recent Advances in Hierarchical Reinforcement Learning[J].Discrete Event Dynamic Systems:Theory and Applications,2003; 13 (4):41～77
3[3]R S Sutton,D Precup,S P Singh.Between MDPs and Semi-MDPs:A Framework for Temporal Abstraction in Reinforcement Learning[J].Artificial Intelligence,1999; 112 (1-2):181～211
4[4]R Parr.Hierarchical Control and Learning for Markov Decision Processes[D].Ph D Thesis.University of California,Berkeley,1998
5[5]T G Dietterich.Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition[J].Journal of Artificial Intelligence Research,2000; 13:227～303
6[6]B L Digney.Learning Hierarchical Control Structures for Multiple Tasks and Changing Environments[C].In:Proceedings of the Fifth International Conference on Simulation of Adaptive Behavior,Zurich,Switzerland,1998:321～330
7[7]A McGovern ,A Barto.Autonomous Discovery of Subgoals in Reinforcement Learning Using Deverse Density[C].In :Proceedings of the Fifth International Conference on Machine Learning,San Fransisco:Morgan Kaufmann,2001:361～368
8[8]I Menache,S Mannor,N Shimkin.Q-Cut:Dynamic discovery of subgoals in reinforcement learning.Lecture Notes in Computer Science,Springer,Vol 2430,2002:295～306
9[9]S Mannor et al.Dynamic Abstraction in Reinforcement Learning via Clustering[C].In :Proceedings of the Twenty-First International Conference on Machine Learning,Banff,Canada,2004:560～567
10[10]D Precup.Temporal Abstraction in Reinforcement Learning[D].Ph D Dissertation.University of Massachusetts,Amherst,2000

共引文献18

1沈晶,顾国昌,刘海波.一种新的分层强化学习方法[J].计算机应用,2006,26(8):1938-1939. 被引量：1
2孟江华,朱纪洪,孙增圻.基于探索密度的Option子目标发现算法[J].模式识别与人工智能,2007,20(2):236-240.
3彭志平,李绍平.一种基于PSO的分层策略搜索算法[J].模式识别与人工智能,2008,21(1):98-103. 被引量：1
4胡明辉,殷苌茗,李立云.基于ACCA的Option自动生成算法[J].计算机工程与应用,2008,44(19):39-40. 被引量：1
5石川,史忠植,王茂光.基于路径匹配的在线分层强化学习方法[J].计算机研究与发展,2008,45(9):1470-1476. 被引量：4
6么刚,张武,王劲林.基于分级代理的智能家庭网络模型研究[J].高技术通讯,2009,19(9):919-925. 被引量：1
7徐明亮,苏晓萍,须文波.基于禁忌搜索的option自动构造[J].系统仿真学报,2009,21(23):7479-7482.
8孙祥,赵勇.基于就业吸引力的大学生区域流向分类研究[J].黄冈师范学院学报,2010,30(3):46-51. 被引量：6
9陈学松,杨宜民.强化学习研究综述[J].计算机应用研究,2010,27(8):2834-2838. 被引量：61
10胡坤,余雪丽,李志.一种改进的自动分层算法BMAXQ[J].计算机工程与应用,2011,47(30):1-3. 被引量：1

1杜小勤,李庆华,韩建军.HAMs体系中的同态变换方法研究[J].小型微型计算机系统,2008,29(11):2074-2082. 被引量：1
2杜小勤,李庆华,韩建军.一种基于HAMs的行为设计方法[J].计算机仿真,2008,25(3):327-331.
3韩光臣,孙树栋,司书宾.基于图论的复杂系统建模技术研究[J].机械科学与技术,2005,24(9):1118-1121. 被引量：5
4叶媛媛,闵春平.多UCAV协同任务规划的层次分解方法[J].火力与指挥控制,2007,32(7):11-14. 被引量：5

小型微型计算机系统

2008年第4期

浏览历史

内容加载中请稍等...

一种基于HAMs体系的层次分解方法

参考文献5

二级参考文献48

共引文献18

相关作者

相关机构

相关主题

浏览历史