期刊文献+

基于探索密度的Option子目标发现算法

Discovery Algorithm for Option Based on Exploration Density
原文传递
导出
摘要 提出状态探索密度的概念,通过检测状态对智能体探索环境能力的影响来发现学习的子目标并构建对应的 Option.用该算法创建 Option 的再励学习算法能有效提高学习速度.算法具有和任务无关、不需要先验知识等优点,构造出的 Option 在同一环境下不同任务间可以直接共享. A new method, named exploration density(ED) inspection, is presented. Useful options were discovered by the method through inspecting the influence of the state on agent's explore ability in state space. The simulation results show that the proposed algorithm has better performance in reinforcement learning. The method has characteristics of task-independence, no need of prior knowledge , etc. The created options can be directly shared among different tasks in the same environment.
出处 《模式识别与人工智能》 EI CSCD 北大核心 2007年第2期236-240,共5页 Pattern Recognition and Artificial Intelligence
关键词 递阶再励学习 OPTION 探索密度(ED) Hierarchical Reinforcement Learning, Option, Exploration Density (ED)
  • 相关文献

参考文献8

  • 1Barto A G, Mahadevan S. Recent Advances in Hierarchical Reinforcement Learning. Discrete Event Dynamic Systems:Theory and Applications, 2003, 13(4): 41-77
  • 2Sutton R, Precup D, Singh S. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning. Artificial Intelligence, 1999, 112(1/2): 181-211
  • 3Parr R, Russell S. Reinforcement Learning with Hierarchies of Machines//Jordan M I, Kearns M J, Solla S A, eds. Advances in Neural Information Processing Systems. Cambridge, USA: MIT Press, 1998, 10:1043-1049
  • 4Dietterich T G. Hierarchical Reinforcement Learning with the Maxq Value Function Decomposition. Journal of Artificial Intelligence Research, 2000, 13(5): 227-303
  • 5Maron O, Lozano-Perez T. A Framework for Multiple-Instance Learning //Jordan M I, Kearns M J, Solla S A, eds. Advances in Neural Information Processing Systems. Cambridge, USA:MIT Press, 1998, 10:5701576
  • 6McGovern E A. Autonomous Discovery of Temporal Abstractions from Interaction with an Environment. Ph. D Dissertation. Amherts, USA: University of Massachusetts. Department of Computer Science, 2002
  • 7Hengst B. Discovering Hierarchy in Reinforcement Learning with HEXQ // Proc of the 19th International Conference on Machine Learning. Sydney, Australia, 2002:243-250
  • 8王本年,高阳,陈兆乾,谢俊元,陈世福.面向Option的k-聚类Subgoal发现算法[J].计算机研究与发展,2006,43(5):851-855. 被引量:8

二级参考文献8

  • 1R.S.Sutton,D.Precup,S.Singh.Between MDPs and semiMDPs:A framework for temporal abstraction in reinforcement learning.Artificial Intelligence,1999,112(1/2):181~211
  • 2R.Parr,S.Russell.Reinforcement learning with hierarchies of machines.In:Proc.Advances in Neural Information Processing Systems 10.Cambridge,MA:MIT Press,1998.1043~1049
  • 3T.G.Dietterich.Hierarchical reinforcement learning with the MAXQ value function decomposition.Journal of Artificial Intelligence Research,2000,13:227~ 303
  • 4R.Matthew Kretchmar,Todd Feil,Rohit Bansal.Improved automatic discovery of subgoals for options in hierarchical reinforcement learning.Journal of Computer Science and Technology,2003,3(2):9~14
  • 5M.Stolle,D.Precup.Learning options in reinforcement learning.The 5th Int'l Symposium on Abstraction,Reformulation and Approximation,Kananaskis,Alberta,Canada,2002
  • 6A.McGovern,A.Barto.Automatic discovery of subgoals in reinforcement learning using diverse density.In:Proc.18th Int' lConf.Machine Learning.San Francisco,CA:Morgan Kaufmann,2001.361~368
  • 7O.Maron,P.T.Lozano.A framework for multiple-instance learning.In:Proc.Advances in Neural Information Processing Systems 10.Cambridge,MA:MIT Press,1998.570~576
  • 8L.Lin.Self-improving agents based on reinforcement learning,planning and teaching.Machine Learning,1992,8 (3):293 ~321

共引文献7

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部