摘要
针对大规模、连续空间随着状态维度指数级增加造成的"维数灾"问题,提出基于Option分层强化学习基础框架的改进的随机技能发现算法。通过定义随机Option生成一棵随机技能树,构造一个随机技能树集合。将任务目标分成子目标,通过学习低阶Option策略,减少因智能体增大而引起学习参数的指数增大。以二维有障碍栅格连续空间内两点间最短路径规划为任务,进行仿真实验和分析,实验结果表明:由于Option被随机定义,因此算法在初始性能上具有间歇的不稳定性,但是随着随机技能树集合的增加,能较快地收敛到近似最优解,能有效克服因为维数灾引起的难以求取最优策略或收敛速度过慢的问题。
In allusion to the large and continuous space's"dimension curse"problem caused by the increase of state dimension exponential order,an improved random skill finding algorithm based on Option hierarchical reinforcement learning framework is proposed. A random skill tree set is generated via defining random Option to construct a random skill tree set. The task goal is divided into several sub-goals,and then the increase of learning parameter exponent due to the increase of the intelligent agent is reduced through learning low-order Option policy. The simulation experiment and analysis were implemented by taking a shortest path between any two points in two- dimension maze with barriers in the continuous space as the task. The experiment result shows that the algorithm may have some intermittent instability in the initial performance because Option is defined randomly,but it can be converged to the approximate optimal solution quickly with the increase of the random skill tree set,which can effectively overcome the problem being hard to obtain the optimal policy and slow convergence due to"dimension curse".
出处
《现代电子技术》
北大核心
2016年第10期14-17,20,共5页
Modern Electronics Technique
基金
国家自然科学基金项目(61303108
61373094
61472262)
江苏省高校自然科学研究项目资助(13KJB520020)
吉林大学符号计算与知识工程教育部重点实验室资助项目(93K172014K04)
江苏省高等职业院校国内高级访问学者计划资助项目(2014FX058)