This paper studies the policy iteration algorithm(PIA)for zero-sum stochastic differential games with the basic long-run average criterion,as well as with its more selective version,the so-called bias criterion.The sy...This paper studies the policy iteration algorithm(PIA)for zero-sum stochastic differential games with the basic long-run average criterion,as well as with its more selective version,the so-called bias criterion.The system is assumed to be a nondegenerate diffusion.We use Lyapunov-like stability conditions that ensure the existence and boundedness of the solution to certain Poisson equation.We also ensure the convergence of a sequence of such solutions,of the corresponding sequence of policies,and,ultimately,of the PIA.展开更多
This paper will present an approximate/adaptive dynamic programming(ADP) algorithm,that uses the idea of integral reinforcement learning(IRL),to determine online the Nash equilibrium solution for the two-player zerosu...This paper will present an approximate/adaptive dynamic programming(ADP) algorithm,that uses the idea of integral reinforcement learning(IRL),to determine online the Nash equilibrium solution for the two-player zerosum differential game with linear dynamics and infinite horizon quadratic cost.The algorithm is built around an iterative method that has been developed in the control engineering community for solving the continuous-time game algebraic Riccati equation(CT-GARE),which underlies the game problem.We here show how the ADP techniques will enhance the capabilities of the offline method allowing an online solution without the requirement of complete knowledge of the system dynamics.The feasibility of the ADP scheme is demonstrated in simulation for a power system control application.The adaptation goal is the best control policy that will face in an optimal manner the highest load disturbance.展开更多
A large class of stochastic differential games for several players is considered in this paper.The class includes Nash differential games as well as Stackelberg differential games.A mix is possible.The existence of fe...A large class of stochastic differential games for several players is considered in this paper.The class includes Nash differential games as well as Stackelberg differential games.A mix is possible.The existence of feedback strategies under general conditions is proved.The limitations concern the functionals in which the state and the controls appear separately.This is also true for the state equations.The controls appear in a quadratic form for the payoff and linearly in the state equation.The most serious restriction is the dimension of the state equation,which cannot exceed 2.The reason comes from PDE(partial differential equations) techniques used in studying the system of Bellman equations obtained by Dynamic Programming arguments.In the authors' previous work in 2002,there is not such a restriction,but there are serious restrictions on the structure of the Hamiltonians,which are violated in the applications dealt with in this article.展开更多
Based upon the theory of the nonlinear quadric two-person nonzero-sum differential game,the fact that the time-limited mixed H2/H∞ control problem can be turned into the problem of solving the state feedback Nash bal...Based upon the theory of the nonlinear quadric two-person nonzero-sum differential game,the fact that the time-limited mixed H2/H∞ control problem can be turned into the problem of solving the state feedback Nash balance point is mentioned. Upon this,a theorem about the solution of the state feedback control is given,the Lyapunov stabilization of the nonlinear system under this control is proved,too. At the same time,this solution is used to design the nonlinear H2/H∞ guidance law of the relative motion between the missile and the target in three-dimensional(3D) space. By solving two coupled Hamilton-Jacobi partial differential inequalities(HJPDI),a control with more robust stabilities and more robust performances is obtained. With different H∞ performance indexes,the correlative weighting factors of the control are analytically designed. At last,simulations under different robust performance indexes and under different initial conditions and under the cases of intercepting different maneuvering targets are carried out. All results indicate that the designed law is valid.展开更多
We consider a finite horizon,zero-sum linear quadratic differential game.The feature of this game is that a weight matrix of the minimiser’s control cost in the cost functional is singular.Due to this singularity,the...We consider a finite horizon,zero-sum linear quadratic differential game.The feature of this game is that a weight matrix of the minimiser’s control cost in the cost functional is singular.Due to this singularity,the game can be solved neither by applying the Isaacs MinMax principle nor using the Bellman–Isaacs equation approach,i.e.this game is singular.Aprevious paper of one of the authors analysed such a game in the case where the cost functional does not contain the minimiser’s control cost at all,i.e.the weight matrix of this cost equals zero.In this case,all coordinates of the minimiser’s control are singular.In the present paper,we study the general case where the weight matrix of the minimiser’s control cost,being singular,is not,in general,zero.This means that only a part of the coordinates of the minimiser’s control is singular,while others are regular.The considered game is treated by a regularisation,i.e.by its approximate conversion to an auxiliary regular game.The latter has the same equation of dynamics and a similar cost functional augmented by an integral of the squares of the singular control coordinates with a small positive weight.Thus,the auxiliary game is a partial cheap control differential game.Based on a singular perturbation’s asymptotic analysis of this auxiliary game,the existence of the value of the original(singular)game is established,and its expression is obtained.The maximiser’s optimal state feedback strategy and the minimising control sequence in the original game are designed.It is shown that the coordinates of the minimising control sequence,corresponding to the regular coordinates of the minimiser’s control,are point-wise convergent in the class of regular functions.The optimal trajectory sequence and the optimal trajectory in the considered singular game also are obtained.An illustrative example is presented.展开更多
In this paper,the pursuit-evasion game with state and control constraints is solved to achieve the Nash equilibrium of both the pursuer and the evader with an iterative self-play technique.Under the condition where th...In this paper,the pursuit-evasion game with state and control constraints is solved to achieve the Nash equilibrium of both the pursuer and the evader with an iterative self-play technique.Under the condition where the Hamiltonian formed by means of Pontryagin’s maximum principle has the unique solution,it can be proven that the iterative control law converges to the Nash equilibrium solution.However,the strong nonlinearity of the ordinary differential equations formulated by Pontryagin’s maximum principle makes the control policy difficult to figured out.Moreover the system dynamics employed in this manuscript contains a high dimensional state vector with constraints.In practical applications,such as the control of aircraft,the provided overload is limited.Therefore,in this paper,we consider the optimal strategy of pursuit-evasion games with constant constraint on the control,while some state vectors are restricted by the function of the input.To address the challenges,the optimal control problems are transformed into nonlinear programming problems through the direct collocation method.Finally,two numerical cases of the aircraft pursuit-evasion scenario are given to demonstrate the effectiveness of the presented method to obtain the optimal control of both the pursuer and the evader.展开更多
Missile interception problem can be regarded as a two-person zero-sum differential games problem,which depends on the solution of Hamilton-Jacobi-Isaacs(HJI)equa-tion.It has been proved impossible to obtain a closed-f...Missile interception problem can be regarded as a two-person zero-sum differential games problem,which depends on the solution of Hamilton-Jacobi-Isaacs(HJI)equa-tion.It has been proved impossible to obtain a closed-form solu-tion due to the nonlinearity of HJI equation,and many iterative algorithms are proposed to solve the HJI equation.Simultane-ous policy updating algorithm(SPUA)is an effective algorithm for solving HJI equation,but it is an on-policy integral reinforce-ment learning(IRL).For online implementation of SPUA,the dis-turbance signals need to be adjustable,which is unrealistic.In this paper,an off-policy IRL algorithm based on SPUA is pro-posed without making use of any knowledge of the systems dynamics.Then,a neural-network based online adaptive critic implementation scheme of the off-policy IRL algorithm is pre-sented.Based on the online off-policy IRL method,a computa-tional intelligence interception guidance(CIIG)law is developed for intercepting high-maneuvering target.As a model-free method,intercepting targets can be achieved through measur-ing system data online.The effectiveness of the CIIG is verified through two missile and target engagement scenarios.展开更多
针对未知但有界扰动下约束非线性系统,提出一种新的鲁棒经济模型预测控制(Economic model predictive control,EMPC)策略,保证闭环系统对扰动输入具有输入到状态稳定性(Input-to-state stability,ISS).基于微分对策原理,分别优化经济目...针对未知但有界扰动下约束非线性系统,提出一种新的鲁棒经济模型预测控制(Economic model predictive control,EMPC)策略,保证闭环系统对扰动输入具有输入到状态稳定性(Input-to-state stability,ISS).基于微分对策原理,分别优化经济目标函数和关于最优经济平衡点的鲁棒稳定性目标函数,其中经济最优性与鲁棒稳定性是具有冲突的两个控制目标.利用鲁棒稳定性目标最优值函数构造EMPC优化的隐式收缩约束,建立鲁棒EMPC的递推可行性和闭环系统关于最优经济平衡点相对于有界扰动输入到状态稳定性结果.最后以连续搅拌反应器为例,对比仿真验证本文策略的有效性.展开更多
随着大规模可再生能源的开发和应用,电网变得越来越庞大且复杂,如何保证大量不同控制器之间的协调是最值得关注的问题之一。利用微分博弈理论可以解决协同控制的问题。然而,传统算法难以求解带约束的微分博弈问题。此外,现有研究建立的...随着大规模可再生能源的开发和应用,电网变得越来越庞大且复杂,如何保证大量不同控制器之间的协调是最值得关注的问题之一。利用微分博弈理论可以解决协同控制的问题。然而,传统算法难以求解带约束的微分博弈问题。此外,现有研究建立的仿真模型几乎是线性的,不利于实际工程应用。针对上述问题,提出了一种基于加权果蝇优化算法(Weighting Fruit Fly Optimization Algorithm,WFOA)的协同进化算法来求解具有非线性约束的多区域频率协同控制模型。仿真结果表明,与协同进化遗传算法和协同多目标粒子群优化算法相比,该方法具有更好的控制效率,同时对系统出现的外部扰动变化及内部机组参数变动具有很好的鲁棒性。展开更多
文摘This paper studies the policy iteration algorithm(PIA)for zero-sum stochastic differential games with the basic long-run average criterion,as well as with its more selective version,the so-called bias criterion.The system is assumed to be a nondegenerate diffusion.We use Lyapunov-like stability conditions that ensure the existence and boundedness of the solution to certain Poisson equation.We also ensure the convergence of a sequence of such solutions,of the corresponding sequence of policies,and,ultimately,of the PIA.
基金supported by the National Science Foundation (No.ECCS-0801330)the Army Research Office (No.W91NF-05-1-0314)
文摘This paper will present an approximate/adaptive dynamic programming(ADP) algorithm,that uses the idea of integral reinforcement learning(IRL),to determine online the Nash equilibrium solution for the two-player zerosum differential game with linear dynamics and infinite horizon quadratic cost.The algorithm is built around an iterative method that has been developed in the control engineering community for solving the continuous-time game algebraic Riccati equation(CT-GARE),which underlies the game problem.We here show how the ADP techniques will enhance the capabilities of the offline method allowing an online solution without the requirement of complete knowledge of the system dynamics.The feasibility of the ADP scheme is demonstrated in simulation for a power system control application.The adaptation goal is the best control policy that will face in an optimal manner the highest load disturbance.
基金supported by DAAD-PPP Hong Kong/Germany (No. G. HK 036/09)
文摘A large class of stochastic differential games for several players is considered in this paper.The class includes Nash differential games as well as Stackelberg differential games.A mix is possible.The existence of feedback strategies under general conditions is proved.The limitations concern the functionals in which the state and the controls appear separately.This is also true for the state equations.The controls appear in a quadratic form for the payoff and linearly in the state equation.The most serious restriction is the dimension of the state equation,which cannot exceed 2.The reason comes from PDE(partial differential equations) techniques used in studying the system of Bellman equations obtained by Dynamic Programming arguments.In the authors' previous work in 2002,there is not such a restriction,but there are serious restrictions on the structure of the Hamiltonians,which are violated in the applications dealt with in this article.
基金Sponsored by the National Natural Science Foundation of China (Grant No.90716028)
文摘Based upon the theory of the nonlinear quadric two-person nonzero-sum differential game,the fact that the time-limited mixed H2/H∞ control problem can be turned into the problem of solving the state feedback Nash balance point is mentioned. Upon this,a theorem about the solution of the state feedback control is given,the Lyapunov stabilization of the nonlinear system under this control is proved,too. At the same time,this solution is used to design the nonlinear H2/H∞ guidance law of the relative motion between the missile and the target in three-dimensional(3D) space. By solving two coupled Hamilton-Jacobi partial differential inequalities(HJPDI),a control with more robust stabilities and more robust performances is obtained. With different H∞ performance indexes,the correlative weighting factors of the control are analytically designed. At last,simulations under different robust performance indexes and under different initial conditions and under the cases of intercepting different maneuvering targets are carried out. All results indicate that the designed law is valid.
文摘We consider a finite horizon,zero-sum linear quadratic differential game.The feature of this game is that a weight matrix of the minimiser’s control cost in the cost functional is singular.Due to this singularity,the game can be solved neither by applying the Isaacs MinMax principle nor using the Bellman–Isaacs equation approach,i.e.this game is singular.Aprevious paper of one of the authors analysed such a game in the case where the cost functional does not contain the minimiser’s control cost at all,i.e.the weight matrix of this cost equals zero.In this case,all coordinates of the minimiser’s control are singular.In the present paper,we study the general case where the weight matrix of the minimiser’s control cost,being singular,is not,in general,zero.This means that only a part of the coordinates of the minimiser’s control is singular,while others are regular.The considered game is treated by a regularisation,i.e.by its approximate conversion to an auxiliary regular game.The latter has the same equation of dynamics and a similar cost functional augmented by an integral of the squares of the singular control coordinates with a small positive weight.Thus,the auxiliary game is a partial cheap control differential game.Based on a singular perturbation’s asymptotic analysis of this auxiliary game,the existence of the value of the original(singular)game is established,and its expression is obtained.The maximiser’s optimal state feedback strategy and the minimising control sequence in the original game are designed.It is shown that the coordinates of the minimising control sequence,corresponding to the regular coordinates of the minimiser’s control,are point-wise convergent in the class of regular functions.The optimal trajectory sequence and the optimal trajectory in the considered singular game also are obtained.An illustrative example is presented.
文摘In this paper,the pursuit-evasion game with state and control constraints is solved to achieve the Nash equilibrium of both the pursuer and the evader with an iterative self-play technique.Under the condition where the Hamiltonian formed by means of Pontryagin’s maximum principle has the unique solution,it can be proven that the iterative control law converges to the Nash equilibrium solution.However,the strong nonlinearity of the ordinary differential equations formulated by Pontryagin’s maximum principle makes the control policy difficult to figured out.Moreover the system dynamics employed in this manuscript contains a high dimensional state vector with constraints.In practical applications,such as the control of aircraft,the provided overload is limited.Therefore,in this paper,we consider the optimal strategy of pursuit-evasion games with constant constraint on the control,while some state vectors are restricted by the function of the input.To address the challenges,the optimal control problems are transformed into nonlinear programming problems through the direct collocation method.Finally,two numerical cases of the aircraft pursuit-evasion scenario are given to demonstrate the effectiveness of the presented method to obtain the optimal control of both the pursuer and the evader.
文摘Missile interception problem can be regarded as a two-person zero-sum differential games problem,which depends on the solution of Hamilton-Jacobi-Isaacs(HJI)equa-tion.It has been proved impossible to obtain a closed-form solu-tion due to the nonlinearity of HJI equation,and many iterative algorithms are proposed to solve the HJI equation.Simultane-ous policy updating algorithm(SPUA)is an effective algorithm for solving HJI equation,but it is an on-policy integral reinforce-ment learning(IRL).For online implementation of SPUA,the dis-turbance signals need to be adjustable,which is unrealistic.In this paper,an off-policy IRL algorithm based on SPUA is pro-posed without making use of any knowledge of the systems dynamics.Then,a neural-network based online adaptive critic implementation scheme of the off-policy IRL algorithm is pre-sented.Based on the online off-policy IRL method,a computa-tional intelligence interception guidance(CIIG)law is developed for intercepting high-maneuvering target.As a model-free method,intercepting targets can be achieved through measur-ing system data online.The effectiveness of the CIIG is verified through two missile and target engagement scenarios.
文摘针对未知但有界扰动下约束非线性系统,提出一种新的鲁棒经济模型预测控制(Economic model predictive control,EMPC)策略,保证闭环系统对扰动输入具有输入到状态稳定性(Input-to-state stability,ISS).基于微分对策原理,分别优化经济目标函数和关于最优经济平衡点的鲁棒稳定性目标函数,其中经济最优性与鲁棒稳定性是具有冲突的两个控制目标.利用鲁棒稳定性目标最优值函数构造EMPC优化的隐式收缩约束,建立鲁棒EMPC的递推可行性和闭环系统关于最优经济平衡点相对于有界扰动输入到状态稳定性结果.最后以连续搅拌反应器为例,对比仿真验证本文策略的有效性.
文摘随着大规模可再生能源的开发和应用,电网变得越来越庞大且复杂,如何保证大量不同控制器之间的协调是最值得关注的问题之一。利用微分博弈理论可以解决协同控制的问题。然而,传统算法难以求解带约束的微分博弈问题。此外,现有研究建立的仿真模型几乎是线性的,不利于实际工程应用。针对上述问题,提出了一种基于加权果蝇优化算法(Weighting Fruit Fly Optimization Algorithm,WFOA)的协同进化算法来求解具有非线性约束的多区域频率协同控制模型。仿真结果表明,与协同进化遗传算法和协同多目标粒子群优化算法相比,该方法具有更好的控制效率,同时对系统出现的外部扰动变化及内部机组参数变动具有很好的鲁棒性。