摘要
利用强化学习及自适应动态规划原理,设计了一种适用于微小型弹药的两回路驾驶仪,并建立了纵向通道控制模型。由于跟踪器问题的最优解不易获得,将系统矩阵与期望输出信号进行增广,构成增广系统并引入折扣因子,将系统的跟踪器设计问题转换为调节器设计问题。基于贝尔曼最优性原理,采用策略迭代的方法对黎卡提方程进行求解,并证明了该算法的收敛性。最后仿真验证了通过策略评估及策略更新两步迭代计算,可以收敛至跟踪器的最优解。
Based on theory of reinforcement learning and adaptive dynamic programming,the two-loop autopilot of miniature munition is designed.Besides,a linear miniature munition control model of longitudinal channel is established.Because it is difficult to get the optimal solution of tracker,the argument system is constructed by putting the system sate matrix and expected signal together,and the tracker problem is transformed into a regulator problem by augmenting the system and introducing a discount factor.Based on the Bellman optimality principle,the iterative is used to solve the Riccati equation,and the convergence of the iterative algorithm is proved.The simulation results show the tracker can converge to optimal solution through the iteration by two steps of strategy evaluation and strategy updating.
作者
范军芳
张鑫
Fan Junfang;Zhang Xin(Beijing Key Laboratory of High Dynamic Navigation Technology,Beijing Information Science and Technology University,Beijing 100101,China)
出处
《战术导弹技术》
北大核心
2019年第4期48-54,共7页
Tactical Missile Technology
基金
北京市科技新星计划(xxjh2015B041)
北京市委组织部青年拔尖人才计划(2015000026833ZK03)
北京市教委青年拔尖人才项目(CIT&TCD201504055)
高动态导航技术北京市重点实验室开放课题(HDN2018002)
关键词
微小型弹药
两回路自动驾驶仪
策略迭代
代数黎卡提方程
强化学习
贝尔曼最优性原理
miniature ammunition
two-loop autopilot
policy iteration
algebraic Riccati equation
reinforcement learning
Bellman optimality principle