期刊文献+

线性二次二人Stackelberg博弈均衡点求解:一种Q学习方法

Seeking equilibrium for linear-quadratic two-player Stackelberg game:a Q-learning approach
原文传递
导出
摘要 近年来,Stackelberg博弈被广泛用于解决信息物理系统安全控制、智能电网能源管理等问题.已有的Stackelberg均衡点求解方法大多需要已知系统模型信息,而在实际应用中模型信息通常难以精确获取,这在一定程度上限制了相关理论研究成果的应用.鉴于此,本文研究了不基于系统模型的Stackelberg博弈均衡点的求解方法.具体地,本文考虑线性二次二人Stackelberg博弈,其中博弈状态演化满足线性方程,且成本函数为二次形式.博弈的两个参与者为能够预测另一个体可能响应的个体(即领导者),和根据领导者策略作出最优响应的个体(即跟随者).因为本文考虑线性形式的状态演化和二次形式的成本函数,且领导者先于跟随者采取行动,故领导者和跟随者的决策问题可建模为两层的线性二次型最优控制问题.本文按照从跟随者到领导者的原则,基于动态规划原理推导出最优控制策略.该策略被证明恰好为Stackelberg均衡策略,但其计算需使用系统模型信息.基于此策略,本文提出一种基于执行器–评价器(actor-critic)结构的Q学习算法,解决了系统动力学模型未知情况下线性二次二人Stackelberg博弈均衡点求解问题.此外,本文理论证明了所提算法能够保证系统状态、执行网络和评价网络权重估计误差一致最终有界,并通过数值仿真实验说明基于Q学习算法所得控制策略能够使系统状态稳定,且估计控制策略下的成本函数偏离均衡策略下的成本函数的幅度较小. In recent years,Stackelberg game has contributed a lot to security control of cyber-physical systems and to energy management in smart grids.The existing methods for seeking Stackelberg equilibrium rely heavily upon complete information of the system dynamics;however,exact system dynamics is difficult to get in real applications,which restricts the applications of the theoretical research results to some extent.In view of this,this paper proposes to seek the equilibrium for Stackelberg game in a model-free way.Specifically,we investigate the linear-quadratic two-player Stackelberg game,in which the game state is evolved along with a linear system and the cost functions are quadratic.The two players in this game are called leader and follower,where the leader makes its decision preferentially with consideration of the reaction functions of the follower,while the follower reacts optimally to the leader’s strategy.Due to the consideration of linear state dynamics and quadratic cost functions,as well as the fact that the leader takes actions prior to the follower,the decision-making problem for the leader and the follower can be formulated as a two-level linear-quadratic optimal control problem.According to the principle“from the follower to the leader”,this paper derives a pair of optimal control strategies through dynamic programming.The resulting strategies are shown exactly to be the Stackelberg equilibria,but they depend on the information of system dynamics.Then a new actor-critic based Q-learning algorithm,which could approximate the resulting equilibrium strategies without any information of system dynamics,is proposed.It is shown that under the proposed Q-learning algorithm,the system state as well as the approximation errors of the parameters for actor and critic neural networks are uniformly ultimately bounded.The simulation results show that the control strategies obtained from the proposed Q-learning algorithm could make the system state stable,and the cost functions under the estimated control strategies have only a small deviation from the optimal ones.
作者 李曼 秦家虎 王龙 Man LI;Jiahu QIN;Long WANG(Department of Automation,University of Science and Technology of China,Hefei 230027,China;Center for Systems and Control,Peking University,Beijing 100871,China)
出处 《中国科学:信息科学》 CSCD 北大核心 2022年第6期1083-1097,共15页 Scientia Sinica(Informationis)
基金 国家自然科学基金(批准号:61922076,61873252,62036002) 霍英东教育基金会高等院校青年教师基金(批准号:161059) 北京大学百度基金(批准号:2020BD017)资助项目。
关键词 线性二次二人Stackelberg博弈 最优控制 模型未知 执行器–评价器结构 Q学习 linear-quadratic two-player Stackelberg game optimal control model-free actor-critic structure Q-learning
  • 相关文献

参考文献11

二级参考文献286

共引文献162

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部