基于Q学习算法的随机离散时间系统的随机线性二次最优追踪控制被引量：3

Stochastic linear quadratic optimal tracking control for stochastic discrete time systems based on Q-learning

下载PDF

导出

摘要针对随机线性离散时间系统,利用Q学习算法求解无限时域的随机线性二次最优追踪控制(SLQT)问题.首先,假设通过命令生成器生成追踪所需的参考信号,并建立一个由原随机系统和参考轨迹系统组成的增广系统,把最优追踪问题转化为最优调节问题的形式.其次,为了在线求解随机系统的最优追踪问题,将随机系统转为确定性系统,并根据增广系统定义随机线性二次最优追踪控制的Q函数,在无需知道系统模型参数的情况下在线求解增广随机代数方程(GSAE).再次,证明了Q学习算法和增广随机代数方程的等价性,给出了Q学习算法实现步骤.最后,给出一个仿真实例说明Q学习算法的有效性. For stochastic linear discrete time systems,a Q-learning algorithm is proposed in this paper to solve the stochastic linear quadratic optimal tracking control problem in the infinite time domain.First,it is assumed that the reference signal required for tracking is generated by the command generator,and an augmented system consisting of the original stochastic system and the reference trajectory system is established,then the optimal tracking problem is transformed into an optimal regulation problem.Second,in order to solve the optimal tracking problem online,the stochastic system is transformed into a deterministic one,the Q function of stochastic linear quadratic optimal tracking control is defined according to the augmented system,and the augmented stochastic algebraic equation is solved online without knowing the parameters of the system model.Third,the equivalence between the Q-learning algorithm and the augmented stochastic algebraic equation is proved,and the implementation steps of the Q-learning algorithm are given.Finally,a simulation example is given to illustrate the effectiveness of the proposed Q-learning algorithm.

作者张正义赵学艳 ZHANG Zhengyi;ZHAO Xueyan(School of Automation Science and Engineering,South China University of Technology,Guangzhou 510640)

机构地区华南理工大学自动化科学与工程学院

出处《南京信息工程大学学报（自然科学版）》 CAS 北大核心 2021年第5期548-555,共8页 Journal of Nanjing University of Information Science & Technology（Natural Science Edition）

基金国家自然科学基金(61873099,62073144) 广东省自然科学基金(2020A1515010441) 广州市科技计划(202002030158,202002030389)。

关键词随机系统 Q学习算法最优追踪控制随机代数方程 stochastic systems Q-learning algorithm optimal tracking control stochastic algebraic equation

分类号 O232 [理学—运筹学与控制论] TP13 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

参考文献3

1黄玉林,张维海.约束随机线性二次最优控制的研究[J].自动化学报,2006,32(2):246-254. 被引量：7
2王涛,张化光.基于策略迭代的连续时间系统的随机线性二次最优控制[J].控制与决策,2015,30(9):1674-1678. 被引量：4
3Xin Chen,Fang Wang.Neural-network-based stochastic linear quadratic optimal tracking control scheme for unknown discrete-time systems using adaptive dynamic programming[J].Control Theory and Technology,2021,19(3):315-327. 被引量：2

二级参考文献14

1黄玉林,张维海.约束随机线性二次最优控制的研究[J].自动化学报,2006,32(2):246-254. 被引量：7
2Kalman R E. Contribution to the theory of optimal control[J]. Boletin de la Sociedad Matematica Mexicana, 1960, 5(2): 102-119.
3Zhang H G, Liu D R, Luo Y H, et al. Adaptive dynamic programming for control-algorithms and stability[M]. London: Springer-Verlag, 2013: 223-255.
4Werbos P J. "Approximate dynamic programming for real-time control and neural modeling" in handbook of intelligent control[M]. New York: Van Nostrand Reinhold, 1992: 493-525.
5Murray J J, Cox C J, Lendaris G G, et al. Adaptive dynamic programming[J]. IEEE Trans on Systems, Man and Cybernetics, 2002, 32(2): 140-153.
6Vrabie D, Pastravanu O, Abu-Khalaf M, et al. Adaptive optimal control for continuous-time linear systems based on policy iteration[J]. Automatica, 2009, 45(2): 477-484.
7Jiang Y, Jiang Z P. Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics[J]. Automatica, 2012, 48(10): 2699- 2704.
8Wonham W M. On a matrix riccati equation of stochastic control[J]. SIAM J on Control, 1968, 6(2): 312-326.
9Rami M A, Moore J B, Zhou X Y. Indifinite stochastic linear quadratic control and generalized differential riccati equation[J]. SIAM J on Control, 2001, 40(4): 1296-1311.
10Chen S P, Li X J, Zhou X Y. Stochastic linear quadratic regulators with indefinite control weight costs[J]. SIAM J on Control, 1998, 36(5): 1685-1702.