分层强化学习中的动态分层方法研究被引量：1

Dynamic Hierarchies in Hierarchical Reinforcement Learning

下载PDF

导出

摘要分层强化学习中现有的自动分层方法均是在对状态空间进行一定程度探测之后一次性生成层次结构,不充分探测不能保证求解质量,过度探测则影响学习速度,为了克服学习算法性能高度依赖于状态空间探测程度这个问题,本文提出一种动态分层方法,该方法将免疫聚类及二次应答机制融入Sutton提出的Option分层强化学习框架,能对Option状态空间进行动态调整,并沿着学习轨迹动态生成Option内部策略,以二维有障碍栅格空间内两点间最短路径规划为学习任务进行了仿真实验,结果表明,动态分层方法对状态空间探测程度的依赖性很小,动态分层方法更适用于解决大规模强化学习问题. The existing automatic hierarchy approaches in hierarchical reinforcement learning construct the hierarchical structure one-off after a certain extent state space exploration. Incomplete exploration cannot assure the quality of solution, whereas over exploration will bring on learning slowdown. In order to solve the problem that the learning performance strongly depends on state space exploration, a dynamic hierarchy approach is presented in this paper. The approach integrates immune clustering and second response into the Options that is a hierarchical reinforcement learning frame proposed by Sutton. In the dynamic hierarchy approach, the state space of Options can be modified dynamically and the local strategies of Options can be learning dynamically along the learning trace. The experiments with shortest path planning in a two-dimensional grid space with obstacles show that the dynamic hierarchy approach little depends on state space exploration. The dynamic hierarchy approach is more applicable to large scale reinforcement learning problems.

作者沈晶顾国昌刘海波

机构地区哈尔滨工程大学计算机科学与技术学院

出处《小型微型计算机系统》 CSCD 北大核心 2007年第2期287-291,共5页 Journal of Chinese Computer Systems

基金国防基础研究计划项目资助哈尔滨工程大学基础研究基金项目(HEUFT05021 HEUFT05068)资助.

关键词分层强化学习动态分层免疫聚类二次应答 hierarchical reinforcement learning dynamic hierarchy immune clustering second response

分类号 TP18 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

参考文献13

1Barto A G,Mahadevan S.Recent advances in hierarchical reinforcement learning[J].Discrete Event Dynamic Systems:Theory and Applications,2003,13(4):41-77.
2Sutton R S,Precup D,Singh S P.Between MDPs and semi-MDPs:a framework for temporal abstraction in reinforcement learning[J].Artificial Intelligence,1999,112(1):181-211.
3Parp R.Hierarchical control and learning for markov decision processes[D].Berkeley:University of California,1998.
4Dietterich T G.Hierarchical reinforcement learning with the MAXQ value function decomposition[J].Journal of Artificial Intelligence Research,2000,13(1):227-303.
5Digney B L.Learning hierarchical control structures for multiple tasks and changing environments[C].In:Proc.of the 5th International Conference on Simulation of Adaptive Behavior,Zurich,Switzerland,1998:321-330.
6Mcgovern A,Barto A.Autonomous discovery of subgoals in reinforcement learning using diverse density[C].In:Proc.of the 8th International Conference on Machine Learning,San Fransisco:Morgan Kaufmann,2001:361-368.
7Menache I,Mannor S,Shimkin N.Q-cut:dynamic discovery of sub-goals in reinforcement learning[C].In:Proc.the 13th European Conference on Machine Learning,Helsinki,Finland,2002:295-306.
8Mannor S,et al.Dynamic abstraction in reinforcement learning via clustering[C].In:Proc.of the 21th International Conference on Machine Learning,Banff,Canada,2004:560-567.
9Precup D.Temporal abstraction in reinforcement learning[D].Amherst:University of Massachusetts,2000.
10肖人彬,王磊.人工免疫系统:原理、模型、分析及展望[J].计算机学报,2002,25(12):1281-1293. 被引量：209

二级参考文献59

1HanJiawei Kamber M 范明等译.数据挖掘:概念与技术[M].北京:机械工业出版社,2001..
2Timmis J, Neal M, Hunt J. Artificial immune system for data analysis. Biosystems, 2000, 55(1-3):143-150
3Timmis J, Neal M. A resource limited artificial immune sys tem for data analysis. Knowledge Based Systems, 2001, 14(3 -4): 121-130
4Timmis J, Knight T. Artificial immunes system: Using the immune system as inspiration for data mining. In: Abbass H A, Sarker R A, Newton C S eds. Data Mining: A HeuristicApproach. Hershey : Idea Publishing Group, 2001. 209- 230
5Ishiguro A, Ichikawa S, Uchikawa Y. A gait acquisition of a 6-legged robot using immune networks. In: Proc IEEE/RSJ/ GI International Conference on Intelligent Robots and Systems, Munich, Germany, 1994, 2:1034- 1041
6Ishiguro A, Shirai Y, Kondo T et al. Immunoid: An architec ture for behavior arbitration based on the immune networks. In: Proc IEEE/RSJ International Conference on Intelligent Robots and Systems, Osaka, Japan, 1996. 1730-1738
7Ishiguro A, Kuboshiki S, Ichikawa S. Gait coordination of hexapod walking robots using mutual-coupled immune net works. In: Proc IEEE International Conference on Evolution ary Computation, Perth, Australia, 1995. 672-677
8Dasgupta D, Forrest S. Artificial immune systems in industrial applications. In: Proc 2nd International Conference on Intelli gent Processing and Manufacturing of Materials, Honolulu, 1999. 257-267
9Smith D J, Forrest S, Perelson A S. Immunological memory is associative. In: Dasgupta ed. Artificial Immune Systems and their Applications. Berlin: Springer, 1998. 105-112
10Burnet F M. Clonal selection and after. In: Bell G I, Perelson A S, Pimbley G H eds. Theoretical Immunology, New York: Marcel Dekker Inc. , 1978. 63-85

共引文献208

1熊孝波,桂国庆,郑明新,许建聪,马淑芝.基于免疫RBF神经网络的深基坑施工变形预测[J].岩土力学,2008(S01):598-602. 被引量：2
2周宣武,杨晓元,王伟,邓涛.基于生物免疫学的入侵检测系统研究[J].大连理工大学学报,2003,43(z1):77-80. 被引量：6
3王杰,陈晨,王同军.基于免疫Multi-Agent的入侵防御模型及其实现[J].微电子学与计算机,2009,26(2):178-183.
4刘强,王艳秋,张健.人工免疫聚类算法在交通时段自动划分上的应用[J].自动化博览,2008,25(2):64-67. 被引量：1
5段丹青,陈松乔,杨卫平.基于人工免疫的多Agent自适应入侵检测系统[J].微机发展,2004,14(8):111-113.
6伍海波,高劲松,唐启涛,张燕.基于生物免疫原理的网络入侵检测研究[J].长沙医学院学报,2014(2):27-31.
7付冬梅,郑德玲,位耀光,周颖,鞠磊.人工免疫控制器的设计及其控制效果的仿真[J].北京科技大学学报,2004,26(4):442-445. 被引量：25
8肖人彬.基于免疫计算的机构轨迹综合[J].计算机辅助设计与图形学学报,2004,16(6):812-818. 被引量：7
9梁美莲,梁家荣,郭晨.基于人工免疫系统的关联规则挖掘算法[J].计算机应用,2004,24(8):50-53. 被引量：4
10蔡自兴,龚涛.免疫算法研究的进展[J].控制与决策,2004,19(8):841-846. 被引量：56

同被引文献15

1KAELBLING L P, LITTMAN M L. Reinforcement learning: a sur- vey[J]. Journal ofArtificiallntelligence Research, 1996, 4(1): 237- 285.
2STRENS M. A Bayesian framework for reinforcement learning[C] //Proceeedings of the 17th International Conference on Machine Learning. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 2000:943 -950.
3SUTTON R S, PRECUP D, SINGH S. Between MDPs and Semi- MDPs: a framework for temporal abstraction in reinforcement learn- ing[J]. Artificial Intelligence, 1999, 112(1): 181 - 211.
4PARR R E. Hierarchical control and learning for Markov decision processes[D]. Berkeley, CA: University of California, 1998.
5DIETTERICH T G, Hierarchical reinforcement learning with the MAXQ value function decomposition[J]. Journal of Artificial Intelli- gence Research, 2000, 13(1): 227 - 303.
6HENGST B. Discovering hierarchy in reinforcement learning with HEXQ[C]//Proceedings of the Nineteenth International Conference on Machine Learning. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc, 2002:243 - 250.
7JONG N K, STONE E Hierarchical model-based reinforcement learning: R-MAX + MAXQ[C]//Proceedings of the 25th Interna- tional Conference on Machine Learning. New York: ACM, 2008: 432 - 439.
8DIUK C, STREHL A L, LITTMAN M L. A hierarchical approach to efficient reinforcement learning in deterministic domains[C]//Pro- ceedings of the 5th International Joint Conference on Autonomous Agents andMultiagent Systems. New York: ACM, 2006:313 - 319.
9SERI S, TADEPALLI E Model-based hierarchical average-reward re- inforcement learning[C]//Proceedings of the 19th International Con- ference on Machine Learning. San Francisco, CA: Morgan Kanfmann Publishers Inc, 2002:562 - 569.
10DAI Z H, CHEN X, Cao W H, et al. Model-based learning with Bayesian and MAXQ value function decomposition for hierarchical task[C]//Proceedings of the 8th World Congress on Intelligent Con- trol and Automation. New York: IEEE, 2010:676 - 681.

引证文献1

1戴朝晖,袁姣红,吴敏,陈鑫.基于概率模型的动态分层强化学习[J].控制理论与应用,2011,28(11):1595-1600. 被引量：2

二级引证文献2

1陈鑫,魏海军,吴敏,曹卫华.基于高斯回归的连续空间多智能体跟踪学习[J].自动化学报,2013,39(12):2021-2031. 被引量：2
2刘智斌,曾晓勤,徐彦,禹继国.采用资格迹的神经网络学习控制算法[J].控制理论与应用,2015,32(7):887-894. 被引量：4

1沈晶,顾国昌,刘海波.基于多智能体的Option自动生成算法[J].智能系统学报,2006,1(1):84-87. 被引量：2
2沈晶,顾国昌,刘海波.分层强化学习中的并行自动分层方法研究[J].计算机工程与设计,2007,28(2):422-424.
3沈晶,顾国昌,刘海波.基于免疫聚类的自动分层强化学习方法研究[J].哈尔滨工程大学学报,2007,28(4):423-428. 被引量：2
4魏娜,朱参世.基于多粒度免疫聚类的分类器设计[J].计算机工程与应用,2007,43(19):104-107.
5周燕,胡志峰.基于免疫聚类的RBF网络在说话人识别中的应用[J].声学技术,2010,29(2):184-187. 被引量：3
6钟将,吴开贵,吴中福,李季,欧灵.基于免疫聚类的入侵检测研究[J].计算机科学,2005,32(7):95-98. 被引量：1
7闫夏,谭光兴,林川.基于免疫聚类算法的MRI膝关节图像分割[J].广西科技大学学报,2015,26(1):70-74. 被引量：12
8王茂森,朱燕生,符涛涛,施建国.基于ARM的某型八足机器人控制系统设计[J].计算机测量与控制,2013,21(4):942-944. 被引量：1
9姜亚莉.基于符号推理的图像连通性分析[J].北京师范大学学报（自然科学版）,2012,48(2):205-209.
10唐老鸭.小硬盘也玩多系统——多系统资源共享全攻略[J].电脑知识与技术（过刊）,2004(12):30-33.

小型微型计算机系统

2007年第2期

浏览历史

内容加载中请稍等...

分层强化学习中的动态分层方法研究被引量：1

参考文献13

二级参考文献59

共引文献208

同被引文献15

引证文献1

二级引证文献2

相关作者

相关机构

相关主题

浏览历史

分层强化学习中的动态分层方法研究 被引量：1

参考文献13

二级参考文献59

共引文献208

同被引文献15

引证文献1

二级引证文献2

相关作者

相关机构

相关主题

浏览历史

分层强化学习中的动态分层方法研究被引量：1