摘要
为解决在线近似策略迭代增强学习计算复杂度高、收敛速度慢的问题,引入CMAC结构作为值函数逼近器,提出一种基于CMAC的非参数化近似策略迭代增强学习(NPAPI-CMAC)算法。算法通过构建样本采集过程确定CMAC泛化参数,利用初始划分和拓展划分确定CMAC状态划分方式,利用量化编码结构构建样本数集合定义增强学习率,实现了增强学习结构和参数的完全自动构建。此外,该算法利用delta规则和最近邻思想在学习过程中自适应调整增强学习参数,利用贪心策略对动作投票器得到的结果进行选择。一级倒立摆平衡控制的仿真实验结果验证了算法的有效性、鲁棒性和快速收敛能力。
In order to solve the problems of high computational complexity and slow convergence rate of online approximation policy iteration reinforcement learning, this essay proposes a nonparametric approximation policy iteration reinforcement learning based on CMAC(NPAPI-CMAC)by introducing CMAC structure as the value function approximator.The CMAC’s generic parameter is determined by constructing the sampling process and its state partition mode is confirmed by using initial partition and development partition. The reinforcement learning rate is defined by building sample numbers set of tilling. Through all these ways the reinforcement learning structure and parameters are constructed completely automatically. In addition, the algorithm uses delta rule and the nearest neighbor method to automatically adjust the parameters of the algorithm in the learning process, and uses the greedy strategy to select an action which is obtained from voting machine. The simulation results on the balancing control of a single inverted pendulum show the effectiveness, robustness and rapid convergence ability of the proposed algorithm.
作者
季挺
张华
JI Ting;ZHANG Hua(Robotics Institute,Nanchang University,Nanchang 330031,China)
出处
《计算机工程与应用》
CSCD
北大核心
2019年第2期128-136,共9页
Computer Engineering and Applications
基金
国家高技术研究发展计划(863)(No.SS2013AA041003)
关键词
增强学习
小脑关节模型控制器
非参数化
倒立摆
reinforcement learning
Cerebellar Model Articulation Controller(CMAC)
nonparametric
inverted pendulum