基于探索密度的Option子目标发现算法

Discovery Algorithm for Option Based on Exploration Density

导出

摘要提出状态探索密度的概念,通过检测状态对智能体探索环境能力的影响来发现学习的子目标并构建对应的 Option.用该算法创建 Option 的再励学习算法能有效提高学习速度.算法具有和任务无关、不需要先验知识等优点,构造出的 Option 在同一环境下不同任务间可以直接共享. A new method, named exploration density（ED） inspection, is presented. Useful options were discovered by the method through inspecting the influence of the state on agent＇s explore ability in state space. The simulation results show that the proposed algorithm has better performance in reinforcement learning. The method has characteristics of task-independence, no need of prior knowledge , etc. The created options can be directly shared among different tasks in the same environment.

作者孟江华朱纪洪孙增圻

机构地区清华大学计算机科学与技术系智能技术与系统国家重点实验室

出处《模式识别与人工智能》 EI CSCD 北大核心 2007年第2期236-240,共5页 Pattern Recognition and Artificial Intelligence

关键词递阶再励学习 OPTION 探索密度(ED) Hierarchical Reinforcement Learning, Option, Exploration Density （ED）

分类号 TP18 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

参考文献8

1Barto A G, Mahadevan S. Recent Advances in Hierarchical Reinforcement Learning. Discrete Event Dynamic Systems:Theory and Applications, 2003, 13(4): 41-77
2Sutton R, Precup D, Singh S. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning. Artificial Intelligence, 1999, 112(1/2): 181-211
3Parr R, Russell S. Reinforcement Learning with Hierarchies of Machines//Jordan M I, Kearns M J, Solla S A, eds. Advances in Neural Information Processing Systems. Cambridge, USA: MIT Press, 1998, 10:1043-1049
4Dietterich T G. Hierarchical Reinforcement Learning with the Maxq Value Function Decomposition. Journal of Artificial Intelligence Research, 2000, 13(5): 227-303
5Maron O, Lozano-Perez T. A Framework for Multiple-Instance Learning //Jordan M I, Kearns M J, Solla S A, eds. Advances in Neural Information Processing Systems. Cambridge, USA:MIT Press, 1998, 10:5701576
6McGovern E A. Autonomous Discovery of Temporal Abstractions from Interaction with an Environment. Ph. D Dissertation. Amherts, USA: University of Massachusetts. Department of Computer Science, 2002
7Hengst B. Discovering Hierarchy in Reinforcement Learning with HEXQ // Proc of the 19th International Conference on Machine Learning. Sydney, Australia, 2002:243-250
8王本年,高阳,陈兆乾,谢俊元,陈世福.面向Option的k-聚类Subgoal发现算法[J].计算机研究与发展,2006,43(5):851-855. 被引量：8

二级参考文献8

1R.S.Sutton,D.Precup,S.Singh.Between MDPs and semiMDPs:A framework for temporal abstraction in reinforcement learning.Artificial Intelligence,1999,112(1/2):181～211
2R.Parr,S.Russell.Reinforcement learning with hierarchies of machines.In:Proc.Advances in Neural Information Processing Systems 10.Cambridge,MA:MIT Press,1998.1043～1049
3T.G.Dietterich.Hierarchical reinforcement learning with the MAXQ value function decomposition.Journal of Artificial Intelligence Research,2000,13:227～ 303
4R.Matthew Kretchmar,Todd Feil,Rohit Bansal.Improved automatic discovery of subgoals for options in hierarchical reinforcement learning.Journal of Computer Science and Technology,2003,3(2):9～14
5M.Stolle,D.Precup.Learning options in reinforcement learning.The 5th Int'l Symposium on Abstraction,Reformulation and Approximation,Kananaskis,Alberta,Canada,2002
6A.McGovern,A.Barto.Automatic discovery of subgoals in reinforcement learning using diverse density.In:Proc.18th Int' lConf.Machine Learning.San Francisco,CA:Morgan Kaufmann,2001.361～368
7O.Maron,P.T.Lozano.A framework for multiple-instance learning.In:Proc.Advances in Neural Information Processing Systems 10.Cambridge,MA:MIT Press,1998.570～576
8L.Lin.Self-improving agents based on reinforcement learning,planning and teaching.Machine Learning,1992,8 (3):293 ～321

共引文献7

1杜小勤,李庆华,韩建军.一种基于HAMs体系的层次分解方法[J].小型微型计算机系统,2008,29(4):653-658.
2胡明辉,殷苌茗,李立云.基于ACCA的Option自动生成算法[J].计算机工程与应用,2008,44(19):39-40. 被引量：1
3石川,史忠植,王茂光.基于路径匹配的在线分层强化学习方法[J].计算机研究与发展,2008,45(9):1470-1476. 被引量：4
4徐明亮,苏晓萍,须文波.基于禁忌搜索的option自动构造[J].系统仿真学报,2009,21(23):7479-7482.
5孙祥,赵勇.基于就业吸引力的大学生区域流向分类研究[J].黄冈师范学院学报,2010,30(3):46-51. 被引量：6
6姜懿庭.基于人工免疫系统的检测器生成算法改进[J].云南民族大学学报（自然科学版）,2012,21(2):141-144. 被引量：1
7XIAO Ding,LI Yi-tong,SHI Chuan.Autonomic discovery of subgoals in hierarchical reinforcement learning[J].The Journal of China Universities of Posts and Telecommunications,2014,21(5):94-104. 被引量：1

1孟江华,朱纪洪,孙增圻.递阶再励学习中Option的自动发现与生成[J].计算机工程与应用,2006,42(33):34-37.
2孟江华,朱纪洪,孙增圻.结构化状态空间中的递阶再励学习方法[J].控制与决策,2007,22(2):233-237.
3唐勇,陈宝峰,张大鹏,陈琛.基于Agent的机器人足球赛中的再励学习算法[J].燕山大学学报,2005,29(4):324-327.
4钟殿美.利用网络在思想品德课教学中进行发现学习[J].教育信息技术,2011(1):54-55.
5周凤.基于证据理论和神经网络的烟雾图像检测[J].现代电子技术,2017,40(7):55-58. 被引量：3
6刘喆.“听课”案例分析[J].科技信息,2010(28).
7苏治宝,陆际联.多机器人系统探索环境和创建地图的研究[J].机器人,2003,25(z1):669-673.
8陈矫阳,陈楸,余伟.特征变换在多源信息融合中的应用[J].兵工自动化,2008,27(2):5-5.
9王庆红,车威威,王子文.基于神经网络的数据融合算法在管道缺陷损伤识别上的应用[J].巴音郭楞职业技术学院学报,2015,0(2):53-56.
10王庆红,车威威,王子文.基于神经网络的数据融合算法在管道缺陷损伤识别上的应用[J].全面腐蚀控制,2013,27(11):70-74. 被引量：3

模式识别与人工智能

2007年第2期

浏览历史

内容加载中请稍等...

基于探索密度的Option子目标发现算法

参考文献8

二级参考文献8

共引文献7

相关作者

相关机构

相关主题

浏览历史