Autonomic discovery of subgoals in hierarchical reinforcement learning 被引量：1

Autonomic discovery of subgoals in hierarchical reinforcement learning

导出

摘要 Option is a promising method to discover the hierarchical structure in reinforcement learning （RL） for learning acceleration. The key to option discovery is about how an agent can find useful subgoals autonomically among the passing trails. By analyzing the agent＇s actions in the trails, useful heuristics can be found. Not only does the agent pass subgoals more frequently, but also its effective actions are restricted in subgoals. As a consequence, the subgoals can be deemed as the most matching action-restricted states in the paths. In the grid-world environment, the concept of the unique-direction value reflecting the action-restricted property was introduced to find the most matching action-restricted states. The unique-direction-value （UDV） approach is chosen to form options offline and online autonomically. Experiments show that the approach can find subgoals correctly. Thus the Q-learning with options found on both offline and online process can accelerate learning significantly. Option is a promising method to discover the hierarchical structure in reinforcement learning （RL） for learning acceleration. The key to option discovery is about how an agent can find useful subgoals autonomically among the passing trails. By analyzing the agent＇s actions in the trails, useful heuristics can be found. Not only does the agent pass subgoals more frequently, but also its effective actions are restricted in subgoals. As a consequence, the subgoals can be deemed as the most matching action-restricted states in the paths. In the grid-world environment, the concept of the unique-direction value reflecting the action-restricted property was introduced to find the most matching action-restricted states. The unique-direction-value （UDV） approach is chosen to form options offline and online autonomically. Experiments show that the approach can find subgoals correctly. Thus the Q-learning with options found on both offline and online process can accelerate learning significantly.

作者 XIAO Ding LI Yi-tong SHI Chuan

机构地区 School of Computer Science

出处《The Journal of China Universities of Posts and Telecommunications》 EI CSCD 2014年第5期94-104,共11页 中国邮电高校学报（英文版）

基金 supported by the National Basic Research Program of China (2013CB329603) the National Natural Science Foundation of China (61375058, 71231002) the China Mobile Research Fund (MCM 20130351) the Ministry of Education of China and the Special Co-Construction Project of Beijing Municipal Commission of Education

关键词 hierarchical reinforcement learning OPTION Q-LEARNING SUBGOAL UDV hierarchical reinforcement learning, option, Q-learning, subgoal, UDV

分类号 TP391.41 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献1

1王本年,高阳,陈兆乾,谢俊元,陈世福.面向Option的k-聚类Subgoal发现算法[J].计算机研究与发展,2006,43(5):851-855. 被引量：8

二级参考文献8

1R.S.Sutton,D.Precup,S.Singh.Between MDPs and semiMDPs:A framework for temporal abstraction in reinforcement learning.Artificial Intelligence,1999,112(1/2):181～211
2R.Parr,S.Russell.Reinforcement learning with hierarchies of machines.In:Proc.Advances in Neural Information Processing Systems 10.Cambridge,MA:MIT Press,1998.1043～1049
3T.G.Dietterich.Hierarchical reinforcement learning with the MAXQ value function decomposition.Journal of Artificial Intelligence Research,2000,13:227～ 303
4R.Matthew Kretchmar,Todd Feil,Rohit Bansal.Improved automatic discovery of subgoals for options in hierarchical reinforcement learning.Journal of Computer Science and Technology,2003,3(2):9～14
5M.Stolle,D.Precup.Learning options in reinforcement learning.The 5th Int'l Symposium on Abstraction,Reformulation and Approximation,Kananaskis,Alberta,Canada,2002
6A.McGovern,A.Barto.Automatic discovery of subgoals in reinforcement learning using diverse density.In:Proc.18th Int' lConf.Machine Learning.San Francisco,CA:Morgan Kaufmann,2001.361～368
7O.Maron,P.T.Lozano.A framework for multiple-instance learning.In:Proc.Advances in Neural Information Processing Systems 10.Cambridge,MA:MIT Press,1998.570～576
8L.Lin.Self-improving agents based on reinforcement learning,planning and teaching.Machine Learning,1992,8 (3):293 ～321

共引文献7

1孟江华,朱纪洪,孙增圻.基于探索密度的Option子目标发现算法[J].模式识别与人工智能,2007,20(2):236-240.
2杜小勤,李庆华,韩建军.一种基于HAMs体系的层次分解方法[J].小型微型计算机系统,2008,29(4):653-658.
3胡明辉,殷苌茗,李立云.基于ACCA的Option自动生成算法[J].计算机工程与应用,2008,44(19):39-40. 被引量：1
4石川,史忠植,王茂光.基于路径匹配的在线分层强化学习方法[J].计算机研究与发展,2008,45(9):1470-1476. 被引量：4
5徐明亮,苏晓萍,须文波.基于禁忌搜索的option自动构造[J].系统仿真学报,2009,21(23):7479-7482.
6孙祥,赵勇.基于就业吸引力的大学生区域流向分类研究[J].黄冈师范学院学报,2010,30(3):46-51. 被引量：6
7姜懿庭.基于人工免疫系统的检测器生成算法改进[J].云南民族大学学报（自然科学版）,2012,21(2):141-144. 被引量：1

同被引文献13

1SUTTON R S, BARTO A G. Reinforcement learning: An intro- duction [M]. Cambridge, MA: MIT Press, 1998.
2KAELBLING L P, LITTMAN M L, MOORE A W. Reinforce- ment learning : A survey [EB/OL]. [1996-05-01]. http ://www.cs. cmu.edu/afs/cs...vey.html.
3BARTO A G, MAHADEVAN S. Recent advances in hierarchi- cal reinforcement learning [J]. Discrete event dynamic systems.2003, 13(4): 341-379.
4SIMSEK O, WOLFE A P, BARTO A G. Identifying useful sub- goals in reinforcement learning by local graph partitioning [C]// Proceedings of the 22nd International Conference on Machine learning. USA: ACM, 2005, 8: 816-823.
5OSENTOSKI S, MAHADEVAN S. Learning state-action basis functions for hierarchical MDPs [C]// Proceedings of the 24th International Conference on Machine learning. USA: ACM, 2007, 7: 705-712.
6MCGOVERN A, BARTO A. Autonomous discovery of subgolas in reinfoeremente learning using deverse density [C]// Pro- ceedings of the 8th Intemational Coference on Machine Learning. San Fransisco:Morgan Kaufmann, 2001 : 361-368.
7JONG N K, STONE P. State abstraction discovery from irrele- vant state variables [J]. IJCAI, 2005, 8: 752-757.
8KONIDARIS G, BARTO A G. Skill discovery in continuous re- inforcement learning domains using skill chaining [J]. NIPS, 2009, 8: 1015-1023.
9KONIDARIS G, KUINDERSMA S, BARTO A G, et al. Con- structing skill trees for reinforcement learning agents from demonstration trajectories [J]. NIPS, 2010, 23 : 1162-1170.
10KONIDARIS G, BARTO A. Efficient skill learning using ab- straction selection [C]// Proceedings of the 21st International Joint Conference on Artificial Intelligence. Pasadena, CA, USA: IS.1.], 2009: 1107-1113.