期刊文献+

分层强化学习中的动态分层方法研究 被引量:1

Dynamic Hierarchies in Hierarchical Reinforcement Learning
下载PDF
导出
摘要 分层强化学习中现有的自动分层方法均是在对状态空间进行一定程度探测之后一次性生成层次结构,不充分探测不能保证求解质量,过度探测则影响学习速度,为了克服学习算法性能高度依赖于状态空间探测程度这个问题,本文提出一种动态分层方法,该方法将免疫聚类及二次应答机制融入Sutton提出的Option分层强化学习框架,能对Option状态空间进行动态调整,并沿着学习轨迹动态生成Option内部策略,以二维有障碍栅格空间内两点间最短路径规划为学习任务进行了仿真实验,结果表明,动态分层方法对状态空间探测程度的依赖性很小,动态分层方法更适用于解决大规模强化学习问题. The existing automatic hierarchy approaches in hierarchical reinforcement learning construct the hierarchical structure one-off after a certain extent state space exploration. Incomplete exploration cannot assure the quality of solution, whereas over exploration will bring on learning slowdown. In order to solve the problem that the learning performance strongly depends on state space exploration, a dynamic hierarchy approach is presented in this paper. The approach integrates immune clustering and second response into the Options that is a hierarchical reinforcement learning frame proposed by Sutton. In the dynamic hierarchy approach, the state space of Options can be modified dynamically and the local strategies of Options can be learning dynamically along the learning trace. The experiments with shortest path planning in a two-dimensional grid space with obstacles show that the dynamic hierarchy approach little depends on state space exploration. The dynamic hierarchy approach is more applicable to large scale reinforcement learning problems.
出处 《小型微型计算机系统》 CSCD 北大核心 2007年第2期287-291,共5页 Journal of Chinese Computer Systems
基金 国防基础研究计划项目资助 哈尔滨工程大学基础研究基金项目(HEUFT05021 HEUFT05068)资助.
关键词 分层强化学习 动态分层 免疫聚类 二次应答 hierarchical reinforcement learning dynamic hierarchy immune clustering second response
  • 相关文献

参考文献13

  • 1Barto A G,Mahadevan S.Recent advances in hierarchical reinforcement learning[J].Discrete Event Dynamic Systems:Theory and Applications,2003,13(4):41-77.
  • 2Sutton R S,Precup D,Singh S P.Between MDPs and semi-MDPs:a framework for temporal abstraction in reinforcement learning[J].Artificial Intelligence,1999,112(1):181-211.
  • 3Parp R.Hierarchical control and learning for markov decision processes[D].Berkeley:University of California,1998.
  • 4Dietterich T G.Hierarchical reinforcement learning with the MAXQ value function decomposition[J].Journal of Artificial Intelligence Research,2000,13(1):227-303.
  • 5Digney B L.Learning hierarchical control structures for multiple tasks and changing environments[C].In:Proc.of the 5th International Conference on Simulation of Adaptive Behavior,Zurich,Switzerland,1998:321-330.
  • 6Mcgovern A,Barto A.Autonomous discovery of subgoals in reinforcement learning using diverse density[C].In:Proc.of the 8th International Conference on Machine Learning,San Fransisco:Morgan Kaufmann,2001:361-368.
  • 7Menache I,Mannor S,Shimkin N.Q-cut:dynamic discovery of sub-goals in reinforcement learning[C].In:Proc.the 13th European Conference on Machine Learning,Helsinki,Finland,2002:295-306.
  • 8Mannor S,et al.Dynamic abstraction in reinforcement learning via clustering[C].In:Proc.of the 21th International Conference on Machine Learning,Banff,Canada,2004:560-567.
  • 9Precup D.Temporal abstraction in reinforcement learning[D].Amherst:University of Massachusetts,2000.
  • 10肖人彬,王磊.人工免疫系统:原理、模型、分析及展望[J].计算机学报,2002,25(12):1281-1293. 被引量:209

二级参考文献59

  • 1HanJiawei Kamber M 范明等译.数据挖掘:概念与技术[M].北京:机械工业出版社,2001..
  • 2Timmis J, Neal M, Hunt J. Artificial immune system for data analysis. Biosystems, 2000, 55(1-3):143-150
  • 3Timmis J, Neal M. A resource limited artificial immune sys tem for data analysis. Knowledge Based Systems, 2001, 14(3 -4): 121-130
  • 4Timmis J, Knight T. Artificial immunes system: Using the immune system as inspiration for data mining. In: Abbass H A, Sarker R A, Newton C S eds. Data Mining: A HeuristicApproach. Hershey : Idea Publishing Group, 2001. 209- 230
  • 5Ishiguro A, Ichikawa S, Uchikawa Y. A gait acquisition of a 6-legged robot using immune networks. In: Proc IEEE/RSJ/ GI International Conference on Intelligent Robots and Systems, Munich, Germany, 1994, 2:1034- 1041
  • 6Ishiguro A, Shirai Y, Kondo T et al. Immunoid: An architec ture for behavior arbitration based on the immune networks. In: Proc IEEE/RSJ International Conference on Intelligent Robots and Systems, Osaka, Japan, 1996. 1730-1738
  • 7Ishiguro A, Kuboshiki S, Ichikawa S. Gait coordination of hexapod walking robots using mutual-coupled immune net works. In: Proc IEEE International Conference on Evolution ary Computation, Perth, Australia, 1995. 672-677
  • 8Dasgupta D, Forrest S. Artificial immune systems in industrial applications. In: Proc 2nd International Conference on Intelli gent Processing and Manufacturing of Materials, Honolulu, 1999. 257-267
  • 9Smith D J, Forrest S, Perelson A S. Immunological memory is associative. In: Dasgupta ed. Artificial Immune Systems and their Applications. Berlin: Springer, 1998. 105-112
  • 10Burnet F M. Clonal selection and after. In: Bell G I, Perelson A S, Pimbley G H eds. Theoretical Immunology, New York: Marcel Dekker Inc. , 1978. 63-85

共引文献208

同被引文献15

  • 1KAELBLING L P, LITTMAN M L. Reinforcement learning: a sur- vey[J]. Journal ofArtificiallntelligence Research, 1996, 4(1): 237- 285.
  • 2STRENS M. A Bayesian framework for reinforcement learning[C] //Proceeedings of the 17th International Conference on Machine Learning. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 2000:943 -950.
  • 3SUTTON R S, PRECUP D, SINGH S. Between MDPs and Semi- MDPs: a framework for temporal abstraction in reinforcement learn- ing[J]. Artificial Intelligence, 1999, 112(1): 181 - 211.
  • 4PARR R E. Hierarchical control and learning for Markov decision processes[D]. Berkeley, CA: University of California, 1998.
  • 5DIETTERICH T G, Hierarchical reinforcement learning with the MAXQ value function decomposition[J]. Journal of Artificial Intelli- gence Research, 2000, 13(1): 227 - 303.
  • 6HENGST B. Discovering hierarchy in reinforcement learning with HEXQ[C]//Proceedings of the Nineteenth International Conference on Machine Learning. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc, 2002:243 - 250.
  • 7JONG N K, STONE E Hierarchical model-based reinforcement learning: R-MAX + MAXQ[C]//Proceedings of the 25th Interna- tional Conference on Machine Learning. New York: ACM, 2008: 432 - 439.
  • 8DIUK C, STREHL A L, LITTMAN M L. A hierarchical approach to efficient reinforcement learning in deterministic domains[C]//Pro- ceedings of the 5th International Joint Conference on Autonomous Agents andMultiagent Systems. New York: ACM, 2006:313 - 319.
  • 9SERI S, TADEPALLI E Model-based hierarchical average-reward re- inforcement learning[C]//Proceedings of the 19th International Con- ference on Machine Learning. San Francisco, CA: Morgan Kanfmann Publishers Inc, 2002:562 - 569.
  • 10DAI Z H, CHEN X, Cao W H, et al. Model-based learning with Bayesian and MAXQ value function decomposition for hierarchical task[C]//Proceedings of the 8th World Congress on Intelligent Con- trol and Automation. New York: IEEE, 2010:676 - 681.

引证文献1

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部