摘要
在强化学习的研究中,常用的知识传递方法通过抽取系统最优策略的特征获得知识.由于所获得知识通常与系统参数有关,因此这些方法难以应用于状态转移概率随系统参数变化的一类任务中.本文提出一种基于定性模糊网络的分层Option算法,该算法用定性动作描述系统的次优策略,并用定性模糊网络抽取次优策略的共同特征获得与参数无关的知识,完成知识传递.倒立摆系统的控制实验结果表明:定性模糊网络能有效地表示各种参数值不同的倒立摆系统所具有的控制规律,获取与系统参数无关的知识,将常用的知识传递方法从参数无关任务扩展到参数相关任务中.
It is difficult to apply the common knowledge transfer method to the tasks that the state transfer probability changes with the parameters, as the knowledge obtained by extracting the common features of optimal policy is usually related to parameters. To solve this problem, this paper proposes a hierarchical option algorithm based on qualitative fuzzy networks. The algorithm learns a sub-optimal policy which is defined by qualitative actions, extracts the common features of suboptimal policy to obtain knowledge unrelated to parameters, and achieves knowledge transfer. Experiment results of inverted pendulum system are presented to prove that the qualitative fuzzy network can describe the common control rules of the inverted pendulum systems with different parameter values and extends the common knowledge transfer method from parameter related tasks to parameter unrelated ones.
出处
《信息与控制》
CSCD
北大核心
2009年第6期673-679,共7页
Information and Control
基金
国家863计划资助项目(2007AA01Z168)
国家自然科学基金资助项目(60805041
60872082)
关键词
强化学习
定性动作
倒立摆
reinforcement learning
qualitative action
inverted pendulum