期刊文献+

Autonomic discovery of subgoals in hierarchical reinforcement learning 被引量:1

Autonomic discovery of subgoals in hierarchical reinforcement learning
原文传递
导出
摘要 Option is a promising method to discover the hierarchical structure in reinforcement learning (RL) for learning acceleration. The key to option discovery is about how an agent can find useful subgoals autonomically among the passing trails. By analyzing the agent's actions in the trails, useful heuristics can be found. Not only does the agent pass subgoals more frequently, but also its effective actions are restricted in subgoals. As a consequence, the subgoals can be deemed as the most matching action-restricted states in the paths. In the grid-world environment, the concept of the unique-direction value reflecting the action-restricted property was introduced to find the most matching action-restricted states. The unique-direction-value (UDV) approach is chosen to form options offline and online autonomically. Experiments show that the approach can find subgoals correctly. Thus the Q-learning with options found on both offline and online process can accelerate learning significantly. Option is a promising method to discover the hierarchical structure in reinforcement learning (RL) for learning acceleration. The key to option discovery is about how an agent can find useful subgoals autonomically among the passing trails. By analyzing the agent's actions in the trails, useful heuristics can be found. Not only does the agent pass subgoals more frequently, but also its effective actions are restricted in subgoals. As a consequence, the subgoals can be deemed as the most matching action-restricted states in the paths. In the grid-world environment, the concept of the unique-direction value reflecting the action-restricted property was introduced to find the most matching action-restricted states. The unique-direction-value (UDV) approach is chosen to form options offline and online autonomically. Experiments show that the approach can find subgoals correctly. Thus the Q-learning with options found on both offline and online process can accelerate learning significantly.
出处 《The Journal of China Universities of Posts and Telecommunications》 EI CSCD 2014年第5期94-104,共11页 中国邮电高校学报(英文版)
基金 supported by the National Basic Research Program of China (2013CB329603) the National Natural Science Foundation of China (61375058, 71231002) the China Mobile Research Fund (MCM 20130351) the Ministry of Education of China and the Special Co-Construction Project of Beijing Municipal Commission of Education
关键词 hierarchical reinforcement learning OPTION Q-LEARNING SUBGOAL UDV hierarchical reinforcement learning, option, Q-learning, subgoal, UDV
  • 相关文献

参考文献1

二级参考文献8

  • 1R.S.Sutton,D.Precup,S.Singh.Between MDPs and semiMDPs:A framework for temporal abstraction in reinforcement learning.Artificial Intelligence,1999,112(1/2):181~211
  • 2R.Parr,S.Russell.Reinforcement learning with hierarchies of machines.In:Proc.Advances in Neural Information Processing Systems 10.Cambridge,MA:MIT Press,1998.1043~1049
  • 3T.G.Dietterich.Hierarchical reinforcement learning with the MAXQ value function decomposition.Journal of Artificial Intelligence Research,2000,13:227~ 303
  • 4R.Matthew Kretchmar,Todd Feil,Rohit Bansal.Improved automatic discovery of subgoals for options in hierarchical reinforcement learning.Journal of Computer Science and Technology,2003,3(2):9~14
  • 5M.Stolle,D.Precup.Learning options in reinforcement learning.The 5th Int'l Symposium on Abstraction,Reformulation and Approximation,Kananaskis,Alberta,Canada,2002
  • 6A.McGovern,A.Barto.Automatic discovery of subgoals in reinforcement learning using diverse density.In:Proc.18th Int' lConf.Machine Learning.San Francisco,CA:Morgan Kaufmann,2001.361~368
  • 7O.Maron,P.T.Lozano.A framework for multiple-instance learning.In:Proc.Advances in Neural Information Processing Systems 10.Cambridge,MA:MIT Press,1998.570~576
  • 8L.Lin.Self-improving agents based on reinforcement learning,planning and teaching.Machine Learning,1992,8 (3):293 ~321

共引文献7

同被引文献13

  • 1SUTTON R S, BARTO A G. Reinforcement learning: An intro- duction [M]. Cambridge, MA: MIT Press, 1998.
  • 2KAELBLING L P, LITTMAN M L, MOORE A W. Reinforce- ment learning : A survey [EB/OL]. [1996-05-01]. http ://www.cs. cmu.edu/afs/cs...vey.html.
  • 3BARTO A G, MAHADEVAN S. Recent advances in hierarchi- cal reinforcement learning [J]. Discrete event dynamic systems.2003, 13(4): 341-379.
  • 4SIMSEK O, WOLFE A P, BARTO A G. Identifying useful sub- goals in reinforcement learning by local graph partitioning [C]// Proceedings of the 22nd International Conference on Machine learning. USA: ACM, 2005, 8: 816-823.
  • 5OSENTOSKI S, MAHADEVAN S. Learning state-action basis functions for hierarchical MDPs [C]// Proceedings of the 24th International Conference on Machine learning. USA: ACM, 2007, 7: 705-712.
  • 6MCGOVERN A, BARTO A. Autonomous discovery of subgolas in reinfoeremente learning using deverse density [C]// Pro- ceedings of the 8th Intemational Coference on Machine Learning. San Fransisco:Morgan Kaufmann, 2001 : 361-368.
  • 7JONG N K, STONE P. State abstraction discovery from irrele- vant state variables [J]. IJCAI, 2005, 8: 752-757.
  • 8KONIDARIS G, BARTO A G. Skill discovery in continuous re- inforcement learning domains using skill chaining [J]. NIPS, 2009, 8: 1015-1023.
  • 9KONIDARIS G, KUINDERSMA S, BARTO A G, et al. Con- structing skill trees for reinforcement learning agents from demonstration trajectories [J]. NIPS, 2010, 23 : 1162-1170.
  • 10KONIDARIS G, BARTO A. Efficient skill learning using ab- straction selection [C]// Proceedings of the 21st International Joint Conference on Artificial Intelligence. Pasadena, CA, USA: IS.1.], 2009: 1107-1113.

引证文献1

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部