期刊文献+

基于非策略Q学习方法的两个个体优化控制 被引量:2

Two-player Optimization Control Based on Off-policy Q-learning Algorithm
下载PDF
导出
摘要 针对线性离散时间系统的非零和博弈问题,提出一种非策略Q学习算法。首先,提出非零和博弈优化问题,并且严格证明根据每个个体性能指标定义的值函数为线性二次型。然后,基于动态规划和Q学习方法,给出非策略Q学习算法,得到非零和博弈的近似最优解,实现系统的全局纳什均衡。此算法不要求系统模型参数已知,完全利用可测数据学习纳什均衡解。最后,算例仿真验证了方法的有效性。 In this paper,an off-policy Q-learning algorithm is proposed for solving non-zero sum game problems of linear discrete-time systems.First,the non-zero sum game problem is proposed,and the value function defined according to the performance index of each player is strictly proved to be linear quadratic.Then,based on the dynamic programming and Q-learning method,an off-policy Q-learning algorithm is developed,and the approximate optimal solution of the non-zero sum game is obtained to realize the global Nash equilibrium of the system.This algorithm does not require the system model parameters to be known a priori,and fully utilizes measurable data to learn the Nash equilibrium solution.Finally,the simulation results are given to show the effectiveness of the proposed method.
作者 肖振飞 李金娜 XIAO Zhen-fei;LI Jin-na(School of Information and Control Engineering,Liaoning Petrochemical University,Fushun 113001,China)
出处 《控制工程》 CSCD 北大核心 2022年第10期1874-1880,共7页 Control Engineering of China
基金 国家自然科学基金资助项目(62073158,61673280) 辽宁省重点领域开放项目(2019-KF-03-06) 辽宁省教育厅基本科研项目(LJKZ0401)。
关键词 自适应动态规划 非策略Q学习 非零和博弈 纳什均衡 Adaptive dynamic programming off-policy Q-learning non-zero sum game Nash equilibrium
  • 相关文献

同被引文献21

引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部