摘要
将分层强化学习算法中的子任务应用于同类学习任务中是当前强化学习的一个研究热点。在控制系统中,分层强化学习算法存在着子任务受系统参数影响而难以重用的问题。针对这一问题,文章提出基于定性动作的分层Option算法。算法用定性动作描述在参数值不同的系统中,系统同一状态的最优动作所具有的共同特征。同时建立分层子任务,用低层子任务屏蔽系统参数对高层子任务的影响,文中提出的算法用于倒立摆的控制中,算法利用学好的高层子任务仅需要进行少量的学习即可成功控制各种参数值不同的倒立摆系统。
In the past years, there has been a rapidly growing interest in how to effectively transfer Option from one task to other related tasks. In the dynamic systems, it is hard for hierarchical reinforcement learning to reuse an Option into related tasks, as the systems have different parameter values. To solve this problem, this paper proposes a hierarchical option algorithm based on qualitative actions. In the dynamic systems with different parameter values, although the optimal actions of the same state are different, they usually have some common characteristics. The algorithm defines these characteristics by qualitative actions. And the algorithm designs hierarchical options, in which the high level options describe the state-qualitative action model, and low level options are used to shield high level options from the effect of parameters. So the high level options can reuse to systems with different parameter values with minor revising. Experiment results of the control of the inverted pendulum are presented to prove that the algorithm can quickly control inverted pendulum systems with different parameter values based on learned high level options.
出处
《煤炭技术》
CAS
北大核心
2014年第1期24-26,共3页
Coal Technology
基金
北京市人才强教深化计划"学术创新团队"(PHR201007130)
关键词
强化学习
定性动作
倒立摆
reinforcement learning
qualitative action
inverted pendulum