一种改进的自动分层算法BMAXQ 被引量：1

BMAXQ:improved algorithm of hierarchical reinforcement learning

下载PDF

导出

摘要针对MAXQ算法存在的弊端,提出一种改进的分层学习算法BMAXQ。该方法修改了MAXQ的抽象机制,利用BP神经网络的特点,使得Agent能够自动发现子任务,实现各分层的并行学习,适应动态环境下的学习任务。 An improved method of hierarchical reinforcement learning which named BMAXQ is presented in order to resolve the shortcomings of MAXQ.It amends the abstract mechanism of MAXQ and utilizes the virtues of BP neural network.This method can make agent find the subtasks automatically and realize parallel learning for every layer.It can be adapted to the learning tasks under the dynamic environment.

作者胡坤余雪丽李志

机构地区太原理工大学计算机科学与技术学院

出处《计算机工程与应用》 CSCD 北大核心 2011年第30期1-3,共3页 Computer Engineering and Applications

基金国家自然科学基金No.60873139 山西省自然科学基金(No.2008011040) 北航虚拟现实技术与系统国家重点实验室开放课题(No.SKVR-KF-09-04)~~

关键词分层强化学习 MAXQ算法 BP神经网络子任务 hierarchical reinforcement learning MAXQ BP neural network subtask

分类号 TP18 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

参考文献6

1Barto A G, Mahadevan S.Recent advances in hierarchical rein- forcement learning[J].Discrete Event Dynamic Systems,2003, 13 (4) :341-379.
2Mehta N,Ray S,Tadepalli P,et al.Automatic discovery and trans- fer of MAXQ hierarchies[C]//Proceedings of the Twenty-Fifth International Conference on Machine Learning,2008:648-655.
3沈晶,顾国昌,刘海波.分层强化学习中的Option自动生成算法[J].计算机工程与应用,2005,41(34):4-6. 被引量：5
4Jong N K, Stone EHierarchical model-based reinforcement leam- ing:R-max+MAXQ[C]//proceedings of the 25th International Con- ference on Machine Learning,2008:432-439.
5Mcgovem A,Barto A G.Accelerating reinforcement learning through the discovery of useful subgoals[C]//Proceedings of the 6th In- ternational Symposium on Artificial Intelligence,Robotics,and Au- tomation in Space:i-SAIRAS,2001.
6Kuzmin V.Connectionist Q-learning in robot control task[R].Sci- entific Proceedings of l~iga Technical University in Series "Com- puter Science", 2002,33.

二级参考文献13

1[2]A G Barto,S Mahadevan.Recent Advances in Hierarchical Reinforcement Learning[J].Discrete Event Dynamic Systems:Theory and Applications,2003; 13 (4):41～77
2[3]R S Sutton,D Precup,S P Singh.Between MDPs and Semi-MDPs:A Framework for Temporal Abstraction in Reinforcement Learning[J].Artificial Intelligence,1999; 112 (1-2):181～211
3[4]R Parr.Hierarchical Control and Learning for Markov Decision Processes[D].Ph D Thesis.University of California,Berkeley,1998
4[5]T G Dietterich.Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition[J].Journal of Artificial Intelligence Research,2000; 13:227～303
5[6]B L Digney.Learning Hierarchical Control Structures for Multiple Tasks and Changing Environments[C].In:Proceedings of the Fifth International Conference on Simulation of Adaptive Behavior,Zurich,Switzerland,1998:321～330
6[7]A McGovern ,A Barto.Autonomous Discovery of Subgoals in Reinforcement Learning Using Deverse Density[C].In :Proceedings of the Fifth International Conference on Machine Learning,San Fransisco:Morgan Kaufmann,2001:361～368
7[8]I Menache,S Mannor,N Shimkin.Q-Cut:Dynamic discovery of subgoals in reinforcement learning.Lecture Notes in Computer Science,Springer,Vol 2430,2002:295～306
8[9]S Mannor et al.Dynamic Abstraction in Reinforcement Learning via Clustering[C].In :Proceedings of the Twenty-First International Conference on Machine Learning,Banff,Canada,2004:560～567
9[10]D Precup.Temporal Abstraction in Reinforcement Learning[D].Ph D Dissertation.University of Massachusetts,Amherst,2000
10[11]N K Jerne.Towards a Network Theory of the Immune System[J].Annual Immunology,1974; 125C(1-2) :373～389

共引文献4

1沈晶,顾国昌,刘海波.一种新的分层强化学习方法[J].计算机应用,2006,26(8):1938-1939. 被引量：1
2杜小勤,李庆华,韩建军.一种基于HAMs体系的层次分解方法[J].小型微型计算机系统,2008,29(4):653-658.
3胡明辉,殷苌茗,李立云.基于ACCA的Option自动生成算法[J].计算机工程与应用,2008,44(19):39-40. 被引量：1
4魏竞毅,赖俊,陈希亮.基于互信息的智能博弈对抗分层强化学习研究[J].计算机技术与发展,2022,32(9):142-147.

引证文献1

1余雪丽,李志,周昌能,崔倩,胡坤.强化学习中异构反馈信号的分析与集成[J].计算机科学与探索,2012,6(4):366-376.

1邢宇明,白振兴.分层强化学习在足球机器人中的应用[J].微计算机信息,2008,24(32):231-233. 被引量：2
2陈巍,吴捷.递归神经网络的卡尔曼滤波及分层学习算法[J].华南理工大学学报（自然科学版）,1998,26(4):44-48. 被引量：2
3陆军,付成伟.基于核密度估计的分层强化学习自动分层算法[J].自动化技术与应用,2008,27(5):6-10.
4李玲玲,刘希玉,卢树强.基于粒子群优化算法的并行学习神经网络集成构造方法[J].山东科学,2007,20(4):16-20. 被引量：3
5安岭丽,彭志平,李铁鹰.MAXQ方法在出租车问题中的应用[J].茂名学院学报,2007,17(1):56-59.
6顾伟,傅德胜,蔡玮.改进的支持向量机中文文本分类[J].微型电脑应用,2014,30(10):17-19. 被引量：1
7庞士焕,朱相冰,张琦,汤萍萍.基于MAXQ方法的分层强化学习[J].计算机技术与发展,2009,19(4):154-156. 被引量：1
8王正群,陈世福,陈兆乾.并行学习神经网络集成方法[J].计算机学报,2005,28(3):402-408. 被引量：36
9刘洁,陈小平,蔡庆生,刘贵全.递归逻辑程序的强构造分层学习算法及其实现[J].小型微型计算机系统,1998,19(3):70-75. 被引量：1
10王芸,孙辉.多策略并行学习的异构粒子群优化算法[J].计算机应用,2015,35(11):3238-3242. 被引量：1

计算机工程与应用

2011年第30期

浏览历史

内容加载中请稍等...

一种改进的自动分层算法BMAXQ 被引量：1

参考文献6

二级参考文献13

共引文献4

引证文献1

相关作者

相关机构

相关主题

浏览历史