摘要
针对线性离散时间系统的非零和博弈问题,提出一种非策略Q学习算法。首先,提出非零和博弈优化问题,并且严格证明根据每个个体性能指标定义的值函数为线性二次型。然后,基于动态规划和Q学习方法,给出非策略Q学习算法,得到非零和博弈的近似最优解,实现系统的全局纳什均衡。此算法不要求系统模型参数已知,完全利用可测数据学习纳什均衡解。最后,算例仿真验证了方法的有效性。
In this paper,an off-policy Q-learning algorithm is proposed for solving non-zero sum game problems of linear discrete-time systems.First,the non-zero sum game problem is proposed,and the value function defined according to the performance index of each player is strictly proved to be linear quadratic.Then,based on the dynamic programming and Q-learning method,an off-policy Q-learning algorithm is developed,and the approximate optimal solution of the non-zero sum game is obtained to realize the global Nash equilibrium of the system.This algorithm does not require the system model parameters to be known a priori,and fully utilizes measurable data to learn the Nash equilibrium solution.Finally,the simulation results are given to show the effectiveness of the proposed method.
作者
肖振飞
李金娜
XIAO Zhen-fei;LI Jin-na(School of Information and Control Engineering,Liaoning Petrochemical University,Fushun 113001,China)
出处
《控制工程》
CSCD
北大核心
2022年第10期1874-1880,共7页
Control Engineering of China
基金
国家自然科学基金资助项目(62073158,61673280)
辽宁省重点领域开放项目(2019-KF-03-06)
辽宁省教育厅基本科研项目(LJKZ0401)。
关键词
自适应动态规划
非策略Q学习
非零和博弈
纳什均衡
Adaptive dynamic programming
off-policy Q-learning
non-zero sum game
Nash equilibrium