期刊文献+

基于ACCA的Option自动生成算法 被引量:1

Option automatic generation algorithm based on ACCA
下载PDF
导出
摘要 提出了一种新的分层强化学习(HRL)Option自动生成算法,以Agent在学习初始阶段探测到的状态空间为输入,并采用改进的蚁群聚类算法(ACCA)对其进行聚类,在聚类后的各状态子集上通过经验回放学习产生内部策略集,从而生成Option,仿真实验验证了该算法是有效的。 A new algorithm for Option automatic generation of hierarchical reinforcement learning is presented.The algorithm takes the state space explored by Agent as input in the initial learning phase and clusters the states employing Ant Colony Clustering Algorithm (ACCA).Based on the clustered state sets,the intra-strategies are learned by an experience replay procedure.As a resuh,the Options are generated.The validity of the algorithm is demonstrated by simulation experiments.
出处 《计算机工程与应用》 CSCD 北大核心 2008年第19期39-40,49,共3页 Computer Engineering and Applications
关键词 分层强化学习 OPTION 蚁群聚类算法 经验回放 hierarchical reinforcement learning Option Ant Colony Clustering Algorithm(ACCA) experience replay
  • 相关文献

参考文献9

  • 1Sutton R S,Precup D,Singh S P.Between MDPs and Semi-MDPs: a framework for temporal abstraction in reinforcement learning[J]. Artificial Intelligence, 1999,112(1/2) : 181-211.
  • 2Parr R.Hierarchical control and learning for Markov decision processes[D].Berkeley:University of California, 1998.
  • 3Dietterich T G.Hierarchical reinforcement Learning with the MAXQ value function decomposition[J].Journal of Artificial Intelligence Research, 2000,13 : 227-303.
  • 4McGovern A,Barto A.Autonomous discovery of subgoals in reinforcement learning using deverse density[C]//Proceedings of the 8th International Conference on Machine Learning.San Fransisco: Morgan Kaufmann, 2001 : 361-368.
  • 5Menache I,Mannor S,Shimkin N.Q-cut:dynamic discovery of subgoals in reinforcement learning[C]//LNCS 2430:Proc of the 13th ECML, 2002: 295-306.
  • 6沈晶,顾国昌,刘海波.分层强化学习中的Option自动生成算法[J].计算机工程与应用,2005,41(34):4-6. 被引量:5
  • 7王本年,高阳,陈兆乾,谢俊元,陈世福.面向Option的k-聚类Subgoal发现算法[J].计算机研究与发展,2006,43(5):851-855. 被引量:8
  • 8Deneubourg J L,Goss S,Franks N,et al.The dynamics of collective sorting Robot-like ants and ant-like robots[C]//Meye J A, Wilson S.Proceedings of the First International Conference on Simulation Adaptive Behaviours From Animals to Animals.Cambridge MA,J MIT Press, 1991-356-365.
  • 9Lin L G.Self-improvlng reactive agents based on reinforcement learning,planning and teaching[J].Machine Learning,1992,8(3/4): 293-321.

二级参考文献21

  • 1[2]A G Barto,S Mahadevan.Recent Advances in Hierarchical Reinforcement Learning[J].Discrete Event Dynamic Systems:Theory and Applications,2003; 13 (4):41~77
  • 2[3]R S Sutton,D Precup,S P Singh.Between MDPs and Semi-MDPs:A Framework for Temporal Abstraction in Reinforcement Learning[J].Artificial Intelligence,1999; 112 (1-2):181~211
  • 3[4]R Parr.Hierarchical Control and Learning for Markov Decision Processes[D].Ph D Thesis.University of California,Berkeley,1998
  • 4[5]T G Dietterich.Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition[J].Journal of Artificial Intelligence Research,2000; 13:227~303
  • 5[6]B L Digney.Learning Hierarchical Control Structures for Multiple Tasks and Changing Environments[C].In:Proceedings of the Fifth International Conference on Simulation of Adaptive Behavior,Zurich,Switzerland,1998:321~330
  • 6[7]A McGovern ,A Barto.Autonomous Discovery of Subgoals in Reinforcement Learning Using Deverse Density[C].In :Proceedings of the Fifth International Conference on Machine Learning,San Fransisco:Morgan Kaufmann,2001:361~368
  • 7[8]I Menache,S Mannor,N Shimkin.Q-Cut:Dynamic discovery of subgoals in reinforcement learning.Lecture Notes in Computer Science,Springer,Vol 2430,2002:295~306
  • 8[9]S Mannor et al.Dynamic Abstraction in Reinforcement Learning via Clustering[C].In :Proceedings of the Twenty-First International Conference on Machine Learning,Banff,Canada,2004:560~567
  • 9[10]D Precup.Temporal Abstraction in Reinforcement Learning[D].Ph D Dissertation.University of Massachusetts,Amherst,2000
  • 10[11]N K Jerne.Towards a Network Theory of the Immune System[J].Annual Immunology,1974; 125C(1-2) :373~389

共引文献10

同被引文献14

引证文献1

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部