摘要
分层强化学习中现有的自动分层方法均是在对状态空间进行一定程度探测之后一次性生成层次结构,不充分探测不能保证求解质量,过度探测则影响学习速度,为了克服学习算法性能高度依赖于状态空间探测程度这个问题,本文提出一种动态分层方法,该方法将免疫聚类及二次应答机制融入Sutton提出的Option分层强化学习框架,能对Option状态空间进行动态调整,并沿着学习轨迹动态生成Option内部策略,以二维有障碍栅格空间内两点间最短路径规划为学习任务进行了仿真实验,结果表明,动态分层方法对状态空间探测程度的依赖性很小,动态分层方法更适用于解决大规模强化学习问题.
The existing automatic hierarchy approaches in hierarchical reinforcement learning construct the hierarchical structure one-off after a certain extent state space exploration. Incomplete exploration cannot assure the quality of solution, whereas over exploration will bring on learning slowdown. In order to solve the problem that the learning performance strongly depends on state space exploration, a dynamic hierarchy approach is presented in this paper. The approach integrates immune clustering and second response into the Options that is a hierarchical reinforcement learning frame proposed by Sutton. In the dynamic hierarchy approach, the state space of Options can be modified dynamically and the local strategies of Options can be learning dynamically along the learning trace. The experiments with shortest path planning in a two-dimensional grid space with obstacles show that the dynamic hierarchy approach little depends on state space exploration. The dynamic hierarchy approach is more applicable to large scale reinforcement learning problems.
出处
《小型微型计算机系统》
CSCD
北大核心
2007年第2期287-291,共5页
Journal of Chinese Computer Systems
基金
国防基础研究计划项目资助
哈尔滨工程大学基础研究基金项目(HEUFT05021
HEUFT05068)资助.
关键词
分层强化学习
动态分层
免疫聚类
二次应答
hierarchical reinforcement learning
dynamic hierarchy
immune clustering
second response