期刊文献+

一种改进的自动分层算法BMAXQ 被引量:1

BMAXQ:improved algorithm of hierarchical reinforcement learning
下载PDF
导出
摘要 针对MAXQ算法存在的弊端,提出一种改进的分层学习算法BMAXQ。该方法修改了MAXQ的抽象机制,利用BP神经网络的特点,使得Agent能够自动发现子任务,实现各分层的并行学习,适应动态环境下的学习任务。 An improved method of hierarchical reinforcement learning which named BMAXQ is presented in order to resolve the shortcomings of MAXQ.It amends the abstract mechanism of MAXQ and utilizes the virtues of BP neural network.This method can make agent find the subtasks automatically and realize parallel learning for every layer.It can be adapted to the learning tasks under the dynamic environment.
出处 《计算机工程与应用》 CSCD 北大核心 2011年第30期1-3,共3页 Computer Engineering and Applications
基金 国家自然科学基金No.60873139 山西省自然科学基金(No.2008011040) 北航虚拟现实技术与系统国家重点实验室开放课题(No.SKVR-KF-09-04)~~
关键词 分层强化学习 MAXQ算法 BP神经网络 子任务 hierarchical reinforcement learning MAXQ BP neural network subtask
  • 相关文献

参考文献6

  • 1Barto A G, Mahadevan S.Recent advances in hierarchical rein- forcement learning[J].Discrete Event Dynamic Systems,2003, 13 (4) :341-379.
  • 2Mehta N,Ray S,Tadepalli P,et al.Automatic discovery and trans- fer of MAXQ hierarchies[C]//Proceedings of the Twenty-Fifth International Conference on Machine Learning,2008:648-655.
  • 3沈晶,顾国昌,刘海波.分层强化学习中的Option自动生成算法[J].计算机工程与应用,2005,41(34):4-6. 被引量:5
  • 4Jong N K, Stone EHierarchical model-based reinforcement leam- ing:R-max+MAXQ[C]//proceedings of the 25th International Con- ference on Machine Learning,2008:432-439.
  • 5Mcgovem A,Barto A G.Accelerating reinforcement learning through the discovery of useful subgoals[C]//Proceedings of the 6th In- ternational Symposium on Artificial Intelligence,Robotics,and Au- tomation in Space:i-SAIRAS,2001.
  • 6Kuzmin V.Connectionist Q-learning in robot control task[R].Sci- entific Proceedings of l~iga Technical University in Series "Com- puter Science", 2002,33.

二级参考文献13

  • 1[2]A G Barto,S Mahadevan.Recent Advances in Hierarchical Reinforcement Learning[J].Discrete Event Dynamic Systems:Theory and Applications,2003; 13 (4):41~77
  • 2[3]R S Sutton,D Precup,S P Singh.Between MDPs and Semi-MDPs:A Framework for Temporal Abstraction in Reinforcement Learning[J].Artificial Intelligence,1999; 112 (1-2):181~211
  • 3[4]R Parr.Hierarchical Control and Learning for Markov Decision Processes[D].Ph D Thesis.University of California,Berkeley,1998
  • 4[5]T G Dietterich.Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition[J].Journal of Artificial Intelligence Research,2000; 13:227~303
  • 5[6]B L Digney.Learning Hierarchical Control Structures for Multiple Tasks and Changing Environments[C].In:Proceedings of the Fifth International Conference on Simulation of Adaptive Behavior,Zurich,Switzerland,1998:321~330
  • 6[7]A McGovern ,A Barto.Autonomous Discovery of Subgoals in Reinforcement Learning Using Deverse Density[C].In :Proceedings of the Fifth International Conference on Machine Learning,San Fransisco:Morgan Kaufmann,2001:361~368
  • 7[8]I Menache,S Mannor,N Shimkin.Q-Cut:Dynamic discovery of subgoals in reinforcement learning.Lecture Notes in Computer Science,Springer,Vol 2430,2002:295~306
  • 8[9]S Mannor et al.Dynamic Abstraction in Reinforcement Learning via Clustering[C].In :Proceedings of the Twenty-First International Conference on Machine Learning,Banff,Canada,2004:560~567
  • 9[10]D Precup.Temporal Abstraction in Reinforcement Learning[D].Ph D Dissertation.University of Massachusetts,Amherst,2000
  • 10[11]N K Jerne.Towards a Network Theory of the Immune System[J].Annual Immunology,1974; 125C(1-2) :373~389

共引文献4

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部