摘要
针对动态规划中的"模型灾"和"维数灾"问题,提出了不需要数学模型和最优控制的执行依赖启发式动态规划(Action-dependent Heuristic Dynamic Programming,ADHDP),而ADHDP执行网络和评价网络采用基于监督学习的误差反向传播(BP)算法,但BP算法收敛速度较慢.在此基础上建立了以径向基神经网络(RBFNN)为执行网络和评价网络,并以梯度下降算法为网络的在线学习算法,对ADHDP的控制算法进行改进.通过用倒立摆学习控制模型进行仿真,验证了改进的ADHDP算法具有良好的控制性能和鲁棒性.
A method named ADHDP(Action-dependent Heuristic Dynamic Programming)was proposed against the problems of dynamic programming“model disaster”and“dimension disaster”.But,critic network and action network of ADHDP using Back Propagation,but BP algorithm converges slowly.On the basis of this,critic network and action network used Radial Basis Function Neural Network,gradient descent algorithm for network online learning algorithm to improved ADHDP control algorithm.The pendulum system model is simulated,the experimental results showed that the method is the effectiveness and robustness under different environmental conditions.
作者
梁英波
张利红
LIANG Yingbo;ZHANG Lihong(Dean’s office Zhoukou Normal University,Zhoukou 466001,China;Department of Physics and Telicomunication Engineering, Zhoukou Normal University, Zhoukou 466001,China)
出处
《周口师范学院学报》
CAS
2017年第5期46-49,共4页
Journal of Zhoukou Normal University
基金
河南省高等学校重点科研项目(No.16B510009)
周口师范学院教育教学改革研究项目(No.J2016050)