摘要
控制系统的应用中存在状态不能直接测量或测量成本高的实际问题,给模型参数未知的系统完全利用状态数据学习最优控制器带来挑战性难题.为解决这一问题,首先构建具有状态观测器且系统矩阵中存在未知参数的离散线性增广系统,定义性能优化指标;然后基于分离定理、动态规划以及Q-学习方法,给出一种具有未知模型参数的非策略Q-学习算法,并设计近似最优观测器,得到完全利用可测量的系统输出和控制输入数据的非策略Q-学习算法,实现基于观测器状态反馈的系统优化控制策略,该算法的优点在于不要求系统模型参数全部已知,不要求系统状态直接可测,利用可测量数据实现指定性能指标的优化;最后,通过仿真实验验证所提出方法的有效性.
In the application of control systems, there is a practical problem that the state cannot be directly measured or the measurement cost is high. In order to solve this problem, a linear discrete-time augmented system with unknown parameters and a state observer is first constructed and the prescribed performance index is defined. Then, based on the separation theorem, the dynamic programming theory and the Q-learning method, a novel off-policy Q-learning algorithm is developed to approximate the optimal observer and the optimal controller for systems with unknown parameters and unmeasured states, such that the control performance is minimized using only measured data. The advantage of this algorithm is that it does not require all the system model parameters to be known and the system state to be directly measurable. Finally, the simulation experiment verifies the effectiveness of the proposed method.
作者
李金娜
马士凯
LI Jin-na;MA Shi-kai(College of Information Engineering,Shenyang University of Chemical Technology,Shenyang 110142,China;State Key Lab of Synthetical Automation for Process Industries,Northeastern University,Shenyang 110004,China)
出处
《控制与决策》
EI
CSCD
北大核心
2020年第12期2889-2897,共9页
Control and Decision
基金
国家自然科学基金项目(61673280)
辽宁省高等学校创新人才项目(LR2017006)。
关键词
非策略Q-学习
最优控制
状态观测器
分离定理
离散系统
近似动态规划
off-policy Q-learning
optimal control
state observer
separation theorem
discrete-time systems
approximate dynamic programming