期刊文献+

连续空间中的随机技能发现算法 被引量:2

A random skill discovery algorithm in continuous spaces
下载PDF
导出
摘要 针对大规模、连续空间随着状态维度指数级增加造成的"维数灾"问题,提出基于Option分层强化学习基础框架的改进的随机技能发现算法。通过定义随机Option生成一棵随机技能树,构造一个随机技能树集合。将任务目标分成子目标,通过学习低阶Option策略,减少因智能体增大而引起学习参数的指数增大。以二维有障碍栅格连续空间内两点间最短路径规划为任务,进行仿真实验和分析,实验结果表明:由于Option被随机定义,因此算法在初始性能上具有间歇的不稳定性,但是随着随机技能树集合的增加,能较快地收敛到近似最优解,能有效克服因为维数灾引起的难以求取最优策略或收敛速度过慢的问题。 In allusion to the large and continuous space's"dimension curse"problem caused by the increase of state dimension exponential order,an improved random skill finding algorithm based on Option hierarchical reinforcement learning framework is proposed. A random skill tree set is generated via defining random Option to construct a random skill tree set. The task goal is divided into several sub-goals,and then the increase of learning parameter exponent due to the increase of the intelligent agent is reduced through learning low-order Option policy. The simulation experiment and analysis were implemented by taking a shortest path between any two points in two- dimension maze with barriers in the continuous space as the task. The experiment result shows that the algorithm may have some intermittent instability in the initial performance because Option is defined randomly,but it can be converged to the approximate optimal solution quickly with the increase of the random skill tree set,which can effectively overcome the problem being hard to obtain the optimal policy and slow convergence due to"dimension curse".
出处 《现代电子技术》 北大核心 2016年第10期14-17,20,共5页 Modern Electronics Technique
基金 国家自然科学基金项目(61303108 61373094 61472262) 江苏省高校自然科学研究项目资助(13KJB520020) 吉林大学符号计算与知识工程教育部重点实验室资助项目(93K172014K04) 江苏省高等职业院校国内高级访问学者计划资助项目(2014FX058)
关键词 强化学习 OPTION 连续空间 随机技能发现 reinforcement learning Option continuous space random skill discovery
  • 相关文献

参考文献14

  • 1SUTTON R S, BARTO A G. Reinforcement learning: An intro- duction [M]. Cambridge, MA: MIT Press, 1998.
  • 2KAELBLING L P, LITTMAN M L, MOORE A W. Reinforce- ment learning : A survey [EB/OL]. [1996-05-01]. http ://www.cs. cmu.edu/afs/cs...vey.html.
  • 3BARTO A G, MAHADEVAN S. Recent advances in hierarchi- cal reinforcement learning [J]. Discrete event dynamic systems.2003, 13(4): 341-379.
  • 4SIMSEK O, WOLFE A P, BARTO A G. Identifying useful sub- goals in reinforcement learning by local graph partitioning [C]// Proceedings of the 22nd International Conference on Machine learning. USA: ACM, 2005, 8: 816-823.
  • 5OSENTOSKI S, MAHADEVAN S. Learning state-action basis functions for hierarchical MDPs [C]// Proceedings of the 24th International Conference on Machine learning. USA: ACM, 2007, 7: 705-712.
  • 6MCGOVERN A, BARTO A. Autonomous discovery of subgolas in reinfoeremente learning using deverse density [C]// Pro- ceedings of the 8th Intemational Coference on Machine Learning. San Fransisco:Morgan Kaufmann, 2001 : 361-368.
  • 7JONG N K, STONE P. State abstraction discovery from irrele- vant state variables [J]. IJCAI, 2005, 8: 752-757.
  • 8KONIDARIS G, BARTO A G. Skill discovery in continuous re- inforcement learning domains using skill chaining [J]. NIPS, 2009, 8: 1015-1023.
  • 9KONIDARIS G, KUINDERSMA S, BARTO A G, et al. Con- structing skill trees for reinforcement learning agents from demonstration trajectories [J]. NIPS, 2010, 23 : 1162-1170.
  • 10刘全,闫其粹,伏玉琛,胡道京,龚声蓉.一种基于启发式奖赏函数的分层强化学习方法[J].计算机研究与发展,2011,48(12):2352-2358. 被引量:11

二级参考文献38

共引文献23

同被引文献18

引证文献2

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部