Peer-to-peer computation offloading has been a promising approach that enables resourcelimited Internet of Things(IoT)devices to offload their computation-intensive tasks to idle peer devices in proximity.Different fr...Peer-to-peer computation offloading has been a promising approach that enables resourcelimited Internet of Things(IoT)devices to offload their computation-intensive tasks to idle peer devices in proximity.Different from dedicated servers,the spare computation resources offered by peer devices are random and intermittent,which affects the offloading performance.The mutual interference caused by multiple simultaneous offloading requestors that share the same wireless channel further complicates the offloading decisions.In this work,we investigate the opportunistic peer-to-peer task offloading problem by jointly considering the stochastic task arrivals,dynamic interuser interference,and opportunistic availability of peer devices.Each requestor makes decisions on both local computation frequency and offloading transmission power to minimize its own expected long-term cost on tasks completion,which takes into consideration its energy consumption,task delay,and task loss due to buffer overflow.The dynamic decision process among multiple requestors is formulated as a stochastic game.By constructing the post-decision states,a decentralized online offloading algorithm is proposed,where each requestor as an independent learning agent learns to approach the optimal strategies with its local observations.Simulation results under different system parameter configurations demonstrate the proposed online algorithm achieves a better performance compared with some existing algorithms,especially in the scenarios with large task arrival probability or small helper availability probability.展开更多
A necessary maximum principle is given for nonzero-sum stochastic Oltterential games with random jumps. The result is applied to solve the H2/H∞ control problem of stochastic systems with random jumps. A necessary an...A necessary maximum principle is given for nonzero-sum stochastic Oltterential games with random jumps. The result is applied to solve the H2/H∞ control problem of stochastic systems with random jumps. A necessary and sufficient condition for the existence of a unique solution to the H2/H∞ control problem is derived. The resulting solution is given by the solution of an uncontrolled forward backward stochastic differential equation with random jumps.展开更多
In this paper, we deal with one kind of two-player zero-sum linear quadratic stochastic differential game problem. We give the existence of an open loop saddle point if and only if the lower and upper values exist.
This paper constructs a non-cooperative/cooperative stochasticdifferential game model to prove that the optimal strategies trajectory ofagents in a system with a topological configuration of a Multi-Local-Worldgraph w...This paper constructs a non-cooperative/cooperative stochasticdifferential game model to prove that the optimal strategies trajectory ofagents in a system with a topological configuration of a Multi-Local-Worldgraph would converge into a certain attractor if the system’s configuration isfixed. Due to the economics and management property, almost all systems aredivided into several independent Local-Worlds, and the interaction betweenagents in the system is more complex. The interaction between agents inthe same Local-World is defined as a stochastic differential cooperativegame;conversely, the interaction between agents in different Local-Worldsis defined as a stochastic differential non-cooperative game. We construct anon-cooperative/cooperative stochastic differential game model to describethe interaction between agents. The solutions of the cooperative and noncooperativegames are obtained by invoking corresponding theories, and thena nonlinear operator is constructed to couple these two solutions together.At last, the optimal strategies trajectory of agents in the system is proven toconverge into a certain attractor, which means that strategies trajectory arecertainty as time tends to infinity or a large positive integer. It is concluded thatthe optimal strategy trajectory with a nonlinear operator of cooperative/noncooperativestochastic differential game between agents can make agentsin a certain Local-World coordinate and make the Local-World paymentmaximize, and can make the all Local-Worlds equilibrated;furthermore, theoptimal strategy of the coupled game can converge into a particular attractorthat decides the optimal property.展开更多
We consider a model of network formation as a stochastic game with random duration proposed initially in Sun and Parilina(Autom Remote Control 82(6):1065–1082,2021).In the model,the leader first suggests a joint proj...We consider a model of network formation as a stochastic game with random duration proposed initially in Sun and Parilina(Autom Remote Control 82(6):1065–1082,2021).In the model,the leader first suggests a joint project to other players,i.e.,the network connecting them.Second,the players are allowed to form fresh links with each other updating the initially proposed network.The stage payoff of any player is defined depending on the network structure.There are two types of randomness in the network formation process:(i)links may fail to be formed with different probabilities although players intend to establish them,(ii)the game process may terminate at any stage or transit to the next stage with a certain probability distribution.Finally,a network is formed as a result of players’decisions and realization of random variables.The cooperative version of the stochastic game is investigated.In particular,we examine the properties of subgame consistency as well as strong subgame consistency of the core.We provide a payment mechanism or regularization of the core elements to sustain its subgame consistency and avoid the player’s deviations from the cooperative trajectory.In addition,the distribution procedure of the core elements is regularized in case there are negative payments to achieve only nonnegative payments to the players at any stage.The sufficient condition of a strongly subgame consistent core is also obtained.We illustrate our theoretical results with a numerical example.展开更多
The existence and uniqueness of the solutions for one kind of forward- backward stochastic differential equations with Brownian motion and Poisson process as the noise source were given under the monotone conditions. ...The existence and uniqueness of the solutions for one kind of forward- backward stochastic differential equations with Brownian motion and Poisson process as the noise source were given under the monotone conditions. Then these results were applied to nonzero-sum differential games with random jumps to get the explicit form of the open-loop Nash equilibrium point by the solution of the forward-backward stochastic differential equations.展开更多
The application of unmanned aerial vehicle(UAV)-mounted base stations is emerging as an effective solution to provide wireless communication service for a target region containing some smart objects(SOs)in internet of...The application of unmanned aerial vehicle(UAV)-mounted base stations is emerging as an effective solution to provide wireless communication service for a target region containing some smart objects(SOs)in internet of things(IoT).This paper investigates the efficient deployment problem of multiple UAVs for IoT communication in dynamic environment.We first define a measurement of communication performance of UAVto-SO in the target region which is regarded as the optimization objective.The state of one SO is active when it needs to transmit or receive the data;otherwise,silent.The switch of two different states is implemented with a certain probability that results in a dynamic communication environment.In the dynamic environment,the active states of SOs cannot be known by UAVs in advance and only neighbouring UAVs can communicate with each other.To overcome these challenges in the deployment,we leverage a game-theoretic learning approach to solve the position-selected problem.This problem is modeled a stochastic game,which is proven that it is an exact potential game and exists the best Nash equilibria(NE).Furthermore,a distributed position optimization algorithm is proposed,which can converge to a pure-strategy NE.Numerical results demonstrate the excellent performance of our proposed algorithm.展开更多
In this paper,a leader-follower stochastic differential game is studied for a linear stochastic differential equation with quadratic cost functionals.The coefficients in the state equation and the weighting matrices i...In this paper,a leader-follower stochastic differential game is studied for a linear stochastic differential equation with quadratic cost functionals.The coefficients in the state equation and the weighting matrices in the cost functionals are all deterministic.Closed-loop strategies are introduced,which require to be independent of initial states;and such a nature makes it very useful and convenient in applications.The follower first solves a stochastic linear quadratic optimal control problem,and his optimal closed-loop strategy is characterized by a Riccati equation,together with an adapted solution to a linear backward stochastic differential equation.Then the leader turns to solve a stochastic linear quadratic optimal control problem of a forward-backward stochastic differential equation,necessary conditions for the existence of the optimal closed-loop strategy for the leader is given by a Riccati equation.Some examples are also given.展开更多
In this paper we study zero-sum stochastic games. The optimality criterion is the long-run expected average criterion, and the payoff function may have neither upper nor lower bounds. We give a new set of conditions f...In this paper we study zero-sum stochastic games. The optimality criterion is the long-run expected average criterion, and the payoff function may have neither upper nor lower bounds. We give a new set of conditions for the existence of a value and a pair of optimal stationary strategies. Our conditions are slightly weaker than those in the previous literature, and some new sufficient conditions for the existence of a pair of optimal stationary strategies are imposed on the primitive data of the model. Our results are illustrated with a queueing system, for which our conditions are satisfied but some of the conditions in some previous literatures fail to hold.展开更多
This paper investigates Nash games for a class of linear stochastic systems governed by Itô’s differential equation with Markovian jump parameters both in finite-time horizon and infinite-time horizon.First,stoc...This paper investigates Nash games for a class of linear stochastic systems governed by Itô’s differential equation with Markovian jump parameters both in finite-time horizon and infinite-time horizon.First,stochastic Nash games are formulated by applying the results of indefinite stochastic linear quadratic(LQ)control problems.Second,in order to obtain Nash equilibrium strategies,crosscoupled stochastic Riccati differential(algebraic)equations(CSRDEs and CSRAEs)are derived.Moreover,in order to demonstrate the validity of the obtained results,stochastic H2/H∞control with state-and control-dependent noise is discussed as an immediate application.Finally,a numerical example is provided.展开更多
To meet the demands of large-scale user access with computation-intensive and delay-sensitive applications,combining ultra-dense networks(UDNs)and mobile edge computing(MEC)are considered as important solutions.In the...To meet the demands of large-scale user access with computation-intensive and delay-sensitive applications,combining ultra-dense networks(UDNs)and mobile edge computing(MEC)are considered as important solutions.In the MEC enabled UDNs,one of the most important issues is computation offloading.Although a number of work have been done toward this issue,the problem of dynamic computation offloading in time-varying environment,especially the dynamic computation offloading problem for multi-user,has not been fully considered.Therefore,in order to fill this gap,the dynamic computation offloading problem in time-varying environment for multi-user is considered in this paper.By considering the dynamic changes of channel state and users’queue state,the dynamic computation offloading problem for multi-user is formulated as a stochastic game,which aims to optimize the delay and packet loss rate of users.To find the optimal solution of the formulated optimization problem,Nash Q-learning(NQLN)algorithm is proposed which can be quickly converged to a Nash equilibrium solution.Finally,extensive simulation results are presented to demonstrate the superiority of NQLN algorithm.It is shown that NQLN algorithm has better optimization performance than the benchmark schemes.展开更多
In this paper we first investigate zero-sum two-player stochastic differential games with reflection, with the help of theory of Reflected Backward Stochastic Differential Equations (RBSDEs). We will establish the d...In this paper we first investigate zero-sum two-player stochastic differential games with reflection, with the help of theory of Reflected Backward Stochastic Differential Equations (RBSDEs). We will establish the dynamic programming principle for the upper and the lower value functions of this kind of stochastic differential games with reflection in a straightforward way. Then the upper and the lower value functions are proved to be the unique viscosity solutions to the associated upper and the lower Hamilton-Jacobi-Bettman-Isaacs equations with obstacles, respectively. The method differs significantly from those used for control problems with reflection, with new techniques developed of interest on its own. Further, we also prove a new estimate for RBSDEs being sharper than that in the paper of E1 Karoui, Kapoudjian, Pardoux, Peng and Quenez (1997), which turns out to be very useful because it allows us to estimate the LP-distance of the solutions of two different RBSDEs by the p-th power of the distance of the initial values of the driving forward equations. We also show that the unique viscosity solution to the approximating Isaacs equation constructed by the penalization method converges to the viscosity solution of the Isaacs equation with obstacle.展开更多
The stochastic variational inequality(SVI)provides a unified form of optimality con-ditions of stochastic optimization and stochastic games which have wide applications in science,engineering,economics and finance.In ...The stochastic variational inequality(SVI)provides a unified form of optimality con-ditions of stochastic optimization and stochastic games which have wide applications in science,engineering,economics and finance.In the recent two decades,one-stage SVI has been studied extensively and widely used in modeling equilibrium problems under uncertainty.Moreover,the recently proposed two-stage SVI and multistage SVI can be applied to the case when the decision makers want to make decisions at different stages in a stochastic environment.The two-stage SVI is a foundation of multistage SVI,which is to find a pair of“here-and-now”solution and“wait-and-see”solution.This paper provides a survey of recent developments in analysis,algorithms and applications of the two-stage SVI.展开更多
This technical note is concerned with the maximum principle for a non-zero sum stochastic differential game with discrete and distributed delays.Not only the state variable,but also control variables of players involv...This technical note is concerned with the maximum principle for a non-zero sum stochastic differential game with discrete and distributed delays.Not only the state variable,but also control variables of players involve discrete and distributed delays.By virtue of the duality method and the generalized anticipated backward stochastic differential equations,the author establishes a necessary maximum principle and a sufficient verification theorem.To explain theoretical results,the author applies them to a dynamic advertising game problem.展开更多
This paper focuses on zero-sum stochastic differential games in the framework of forwardbackward stochastic differential equations on a finite time horizon with both players adopting impulse controls.By means of BSDE ...This paper focuses on zero-sum stochastic differential games in the framework of forwardbackward stochastic differential equations on a finite time horizon with both players adopting impulse controls.By means of BSDE methods,in particular that of the notion from Peng’s stochastic backward semigroups,the authors prove a dynamic programming principle for both the upper and the lower value functions of the game.The upper and the lower value functions are then shown to be the unique viscosity solutions of the Hamilton-Jacobi-Bellman-Isaacs equations with a double-obstacle.As a consequence,the uniqueness implies that the upper and lower value functions coincide and the game admits a value.展开更多
We study a kind of partial information non-zero sum differential games of mean-field backward doubly stochastic differential equations,in which the coefficient contains not only the state process but also its marginal...We study a kind of partial information non-zero sum differential games of mean-field backward doubly stochastic differential equations,in which the coefficient contains not only the state process but also its marginal distribution,and the cost functional is also of mean-field type.It is required that the control is adapted to a sub-filtration of the filtration generated by the underlying Brownian motions.We establish a necessary condition in the form of maximum principle and a verification theorem,which is a sufficient condition for Nash equilibrium point.We use the theoretical results to deal with a partial information linear-quadratic(LQ)game,and obtain the unique Nash equilibrium point for our LQ game problem by virtue of the unique solvability of mean-field forward-backward doubly stochastic differential equation.展开更多
This paper studies the policy iteration algorithm(PIA)for zero-sum stochastic differential games with the basic long-run average criterion,as well as with its more selective version,the so-called bias criterion.The sy...This paper studies the policy iteration algorithm(PIA)for zero-sum stochastic differential games with the basic long-run average criterion,as well as with its more selective version,the so-called bias criterion.The system is assumed to be a nondegenerate diffusion.We use Lyapunov-like stability conditions that ensure the existence and boundedness of the solution to certain Poisson equation.We also ensure the convergence of a sequence of such solutions,of the corresponding sequence of policies,and,ultimately,of the PIA.展开更多
This article is the second part of Active Power Correction Strategies Based on Deep Reinforcement Learning.In Part II,we consider the renewable energy scenarios plugged into the large-scale power grid and provide an a...This article is the second part of Active Power Correction Strategies Based on Deep Reinforcement Learning.In Part II,we consider the renewable energy scenarios plugged into the large-scale power grid and provide an adaptive algorithmic implementation to maintain power grid stability.Based on the robustness method in Part I,a distributed deep reinforcement learning method is proposed to overcome the infuence of the increasing renewable energy penetration.A multi-agent system is implemented in multiple control areas of the power system,which conducts a fully cooperative stochastic game.Based on the Monte Carlo tree search mentioned in Part I,we select practical actions in each sub-control area to search the Nash equilibrium of the game.Based on the QMIX method,a structure of offine centralized training and online distributed execution is proposed to employ better practical actions in the active power correction control.Our proposed method is evaluated in the modified global competition scenario cases of“2020 Learning to Run a Power Network.Neurips Track 2”.展开更多
Motivated by various mean-field type linear-quadratic(MF-LQ,for short)multilevel Stackelberg games,we propose a kind of multi-level self-similar randomized dominationmonotonicity structures.When the coefficients of a ...Motivated by various mean-field type linear-quadratic(MF-LQ,for short)multilevel Stackelberg games,we propose a kind of multi-level self-similar randomized dominationmonotonicity structures.When the coefficients of a class of mean-field type forwardbackward stochastic differential equations(MF-FBSDEs,for short)satisfy this kind of structures,we prove the existence,the uniqueness,an estimate and the continuous dependence on the coefficients of solutions.Further,the theoretical results are applied to construct unique Stackelberg equilibria for forward and backward MF-LQ multi-level Stackelberg games,respectively.展开更多
基金supported by National Natural Science Foundation of China (No. 62101601)
文摘Peer-to-peer computation offloading has been a promising approach that enables resourcelimited Internet of Things(IoT)devices to offload their computation-intensive tasks to idle peer devices in proximity.Different from dedicated servers,the spare computation resources offered by peer devices are random and intermittent,which affects the offloading performance.The mutual interference caused by multiple simultaneous offloading requestors that share the same wireless channel further complicates the offloading decisions.In this work,we investigate the opportunistic peer-to-peer task offloading problem by jointly considering the stochastic task arrivals,dynamic interuser interference,and opportunistic availability of peer devices.Each requestor makes decisions on both local computation frequency and offloading transmission power to minimize its own expected long-term cost on tasks completion,which takes into consideration its energy consumption,task delay,and task loss due to buffer overflow.The dynamic decision process among multiple requestors is formulated as a stochastic game.By constructing the post-decision states,a decentralized online offloading algorithm is proposed,where each requestor as an independent learning agent learns to approach the optimal strategies with its local observations.Simulation results under different system parameter configurations demonstrate the proposed online algorithm achieves a better performance compared with some existing algorithms,especially in the scenarios with large task arrival probability or small helper availability probability.
基金supported by the Doctoral foundation of University of Jinan(XBS1213)the National Natural Science Foundation of China(11101242)
文摘A necessary maximum principle is given for nonzero-sum stochastic Oltterential games with random jumps. The result is applied to solve the H2/H∞ control problem of stochastic systems with random jumps. A necessary and sufficient condition for the existence of a unique solution to the H2/H∞ control problem is derived. The resulting solution is given by the solution of an uncontrolled forward backward stochastic differential equation with random jumps.
基金The Young Research Foundation(201201130) of Jilin Provincial Science&Technology DepartmentResearch Foundation(2011LG17) of Changchun University of Technology
文摘In this paper, we deal with one kind of two-player zero-sum linear quadratic stochastic differential game problem. We give the existence of an open loop saddle point if and only if the lower and upper values exist.
基金supported by the National Natural Science Foundation of China, (Grant Nos.72174064,71671054,and 61976064)the Natural Science Foundation of Shandong Province,“Dynamic Coordination Mechanism of the Fresh Agricultural Produce Supply Chain Driven by Customer Behavior from the Perspective of Quality Loss” (ZR2020MG004)Industrial Internet Security Evaluation Service Project (TC210W09P).
文摘This paper constructs a non-cooperative/cooperative stochasticdifferential game model to prove that the optimal strategies trajectory ofagents in a system with a topological configuration of a Multi-Local-Worldgraph would converge into a certain attractor if the system’s configuration isfixed. Due to the economics and management property, almost all systems aredivided into several independent Local-Worlds, and the interaction betweenagents in the system is more complex. The interaction between agents inthe same Local-World is defined as a stochastic differential cooperativegame;conversely, the interaction between agents in different Local-Worldsis defined as a stochastic differential non-cooperative game. We construct anon-cooperative/cooperative stochastic differential game model to describethe interaction between agents. The solutions of the cooperative and noncooperativegames are obtained by invoking corresponding theories, and thena nonlinear operator is constructed to couple these two solutions together.At last, the optimal strategies trajectory of agents in the system is proven toconverge into a certain attractor, which means that strategies trajectory arecertainty as time tends to infinity or a large positive integer. It is concluded thatthe optimal strategy trajectory with a nonlinear operator of cooperative/noncooperativestochastic differential game between agents can make agentsin a certain Local-World coordinate and make the Local-World paymentmaximize, and can make the all Local-Worlds equilibrated;furthermore, theoptimal strategy of the coupled game can converge into a particular attractorthat decides the optimal property.
基金supported by the Russian Science Foundation(No.22-21-00346)。
文摘We consider a model of network formation as a stochastic game with random duration proposed initially in Sun and Parilina(Autom Remote Control 82(6):1065–1082,2021).In the model,the leader first suggests a joint project to other players,i.e.,the network connecting them.Second,the players are allowed to form fresh links with each other updating the initially proposed network.The stage payoff of any player is defined depending on the network structure.There are two types of randomness in the network formation process:(i)links may fail to be formed with different probabilities although players intend to establish them,(ii)the game process may terminate at any stage or transit to the next stage with a certain probability distribution.Finally,a network is formed as a result of players’decisions and realization of random variables.The cooperative version of the stochastic game is investigated.In particular,we examine the properties of subgame consistency as well as strong subgame consistency of the core.We provide a payment mechanism or regularization of the core elements to sustain its subgame consistency and avoid the player’s deviations from the cooperative trajectory.In addition,the distribution procedure of the core elements is regularized in case there are negative payments to achieve only nonnegative payments to the players at any stage.The sufficient condition of a strongly subgame consistent core is also obtained.We illustrate our theoretical results with a numerical example.
基金国家自然科学基金,Outstanding Young Teachers of Ministry of Education of China,Special Fund for Ph.D.Program of Ministry of Education of China,Fok Ying Tung Education Foundation
文摘The existence and uniqueness of the solutions for one kind of forward- backward stochastic differential equations with Brownian motion and Poisson process as the noise source were given under the monotone conditions. Then these results were applied to nonzero-sum differential games with random jumps to get the explicit form of the open-loop Nash equilibrium point by the solution of the forward-backward stochastic differential equations.
基金supported in part by the Natural Science Foundation of China under Grants 61801243, 61671144, and 61971238by the China Postdoctoral Science Foundation under Grant 2019M651914+1 种基金by the Natural Science Foundation of the Jiangsu Higher Education Institutions of China under Grant 18KJB510026by the Foundation of Nanjing University of Posts and Telecommunications under Grant NY218124
文摘The application of unmanned aerial vehicle(UAV)-mounted base stations is emerging as an effective solution to provide wireless communication service for a target region containing some smart objects(SOs)in internet of things(IoT).This paper investigates the efficient deployment problem of multiple UAVs for IoT communication in dynamic environment.We first define a measurement of communication performance of UAVto-SO in the target region which is regarded as the optimization objective.The state of one SO is active when it needs to transmit or receive the data;otherwise,silent.The switch of two different states is implemented with a certain probability that results in a dynamic communication environment.In the dynamic environment,the active states of SOs cannot be known by UAVs in advance and only neighbouring UAVs can communicate with each other.To overcome these challenges in the deployment,we leverage a game-theoretic learning approach to solve the position-selected problem.This problem is modeled a stochastic game,which is proven that it is an exact potential game and exists the best Nash equilibria(NE).Furthermore,a distributed position optimization algorithm is proposed,which can converge to a pure-strategy NE.Numerical results demonstrate the excellent performance of our proposed algorithm.
基金This work was supported by National Key Research&Development Program of China under Grant No.2022YFA1006104National Natural Science Foundations of China under Grant Nos.11971266,11831010Shandong Provincial Natural Science Foundations under Grant Nos.ZR2022JQ01,ZR2020ZD24,ZR2019ZD42.
文摘In this paper,a leader-follower stochastic differential game is studied for a linear stochastic differential equation with quadratic cost functionals.The coefficients in the state equation and the weighting matrices in the cost functionals are all deterministic.Closed-loop strategies are introduced,which require to be independent of initial states;and such a nature makes it very useful and convenient in applications.The follower first solves a stochastic linear quadratic optimal control problem,and his optimal closed-loop strategy is characterized by a Riccati equation,together with an adapted solution to a linear backward stochastic differential equation.Then the leader turns to solve a stochastic linear quadratic optimal control problem of a forward-backward stochastic differential equation,necessary conditions for the existence of the optimal closed-loop strategy for the leader is given by a Riccati equation.Some examples are also given.
文摘In this paper we study zero-sum stochastic games. The optimality criterion is the long-run expected average criterion, and the payoff function may have neither upper nor lower bounds. We give a new set of conditions for the existence of a value and a pair of optimal stationary strategies. Our conditions are slightly weaker than those in the previous literature, and some new sufficient conditions for the existence of a pair of optimal stationary strategies are imposed on the primitive data of the model. Our results are illustrated with a queueing system, for which our conditions are satisfied but some of the conditions in some previous literatures fail to hold.
基金supported by the National Natural Science Foundation of China(No.71171061)China Postdoctoral Science Foundation(No.2014M552177)+2 种基金the Natural Science Foundation of Guangdong Province(No.S2011010004970)the Doctors Start-up Project of Guangdong University of Technology(No.13ZS0031)the 2014 Guangzhou Philosophy and Social Science Project(No.14Q21).
文摘This paper investigates Nash games for a class of linear stochastic systems governed by Itô’s differential equation with Markovian jump parameters both in finite-time horizon and infinite-time horizon.First,stochastic Nash games are formulated by applying the results of indefinite stochastic linear quadratic(LQ)control problems.Second,in order to obtain Nash equilibrium strategies,crosscoupled stochastic Riccati differential(algebraic)equations(CSRDEs and CSRAEs)are derived.Moreover,in order to demonstrate the validity of the obtained results,stochastic H2/H∞control with state-and control-dependent noise is discussed as an immediate application.Finally,a numerical example is provided.
基金supported by the National Key Research and Development Program of China(2019YFB1804403)。
文摘To meet the demands of large-scale user access with computation-intensive and delay-sensitive applications,combining ultra-dense networks(UDNs)and mobile edge computing(MEC)are considered as important solutions.In the MEC enabled UDNs,one of the most important issues is computation offloading.Although a number of work have been done toward this issue,the problem of dynamic computation offloading in time-varying environment,especially the dynamic computation offloading problem for multi-user,has not been fully considered.Therefore,in order to fill this gap,the dynamic computation offloading problem in time-varying environment for multi-user is considered in this paper.By considering the dynamic changes of channel state and users’queue state,the dynamic computation offloading problem for multi-user is formulated as a stochastic game,which aims to optimize the delay and packet loss rate of users.To find the optimal solution of the formulated optimization problem,Nash Q-learning(NQLN)algorithm is proposed which can be quickly converged to a Nash equilibrium solution.Finally,extensive simulation results are presented to demonstrate the superiority of NQLN algorithm.It is shown that NQLN algorithm has better optimization performance than the benchmark schemes.
基金supported by the Agence Nationale de la Recherche (France), reference ANR-10-BLAN 0112the Marie Curie ITN "Controlled Systems", call: FP7-PEOPLE-2007-1-1-ITN, no. 213841-2+3 种基金supported by the National Natural Science Foundation of China (No. 10701050, 11071144)National Basic Research Program of China (973 Program) (No. 2007CB814904)Shandong Province (No. Q2007A04),Independent Innovation Foundation of Shandong Universitythe Project-sponsored by SRF for ROCS, SEM
文摘In this paper we first investigate zero-sum two-player stochastic differential games with reflection, with the help of theory of Reflected Backward Stochastic Differential Equations (RBSDEs). We will establish the dynamic programming principle for the upper and the lower value functions of this kind of stochastic differential games with reflection in a straightforward way. Then the upper and the lower value functions are proved to be the unique viscosity solutions to the associated upper and the lower Hamilton-Jacobi-Bettman-Isaacs equations with obstacles, respectively. The method differs significantly from those used for control problems with reflection, with new techniques developed of interest on its own. Further, we also prove a new estimate for RBSDEs being sharper than that in the paper of E1 Karoui, Kapoudjian, Pardoux, Peng and Quenez (1997), which turns out to be very useful because it allows us to estimate the LP-distance of the solutions of two different RBSDEs by the p-th power of the distance of the initial values of the driving forward equations. We also show that the unique viscosity solution to the approximating Isaacs equation constructed by the penalization method converges to the viscosity solution of the Isaacs equation with obstacle.
基金supported by Hong Kong Research Grant Council PolyU(No.153001/18P)supported by the National Natural Science Foundation of China(Nos.11871276 and 11571178).
文摘The stochastic variational inequality(SVI)provides a unified form of optimality con-ditions of stochastic optimization and stochastic games which have wide applications in science,engineering,economics and finance.In the recent two decades,one-stage SVI has been studied extensively and widely used in modeling equilibrium problems under uncertainty.Moreover,the recently proposed two-stage SVI and multistage SVI can be applied to the case when the decision makers want to make decisions at different stages in a stochastic environment.The two-stage SVI is a foundation of multistage SVI,which is to find a pair of“here-and-now”solution and“wait-and-see”solution.This paper provides a survey of recent developments in analysis,algorithms and applications of the two-stage SVI.
基金the National Natural Science Foundation of China under Grant No.11701214Shandong Provincial Natural Science FoundationChina under Grant No.ZR2019MA045。
文摘This technical note is concerned with the maximum principle for a non-zero sum stochastic differential game with discrete and distributed delays.Not only the state variable,but also control variables of players involve discrete and distributed delays.By virtue of the duality method and the generalized anticipated backward stochastic differential equations,the author establishes a necessary maximum principle and a sufficient verification theorem.To explain theoretical results,the author applies them to a dynamic advertising game problem.
基金supported by the National Nature Science Foundation of China under Grant Nos.11701040,11871010,61871058the Fundamental Research Funds for the Central Universities under Grant No.2019XDA11。
文摘This paper focuses on zero-sum stochastic differential games in the framework of forwardbackward stochastic differential equations on a finite time horizon with both players adopting impulse controls.By means of BSDE methods,in particular that of the notion from Peng’s stochastic backward semigroups,the authors prove a dynamic programming principle for both the upper and the lower value functions of the game.The upper and the lower value functions are then shown to be the unique viscosity solutions of the Hamilton-Jacobi-Bellman-Isaacs equations with a double-obstacle.As a consequence,the uniqueness implies that the upper and lower value functions coincide and the game admits a value.
基金supported in part by the National Natural Science Foundation of China(Grant Nos.11871309,11671229,71871129,11371226,11301298)the National Key R&D Program of China(Grant No.2018 YFA0703900)+2 种基金the Natural Science Foundation of Shandong Province(No.ZR2019MA013)the Special Funds of Taishan Scholar Project(No.tsqn20161041)the Fostering Project of Dominant Discipline and Talent Team of Shandong Province Higher Education Institutions.
文摘We study a kind of partial information non-zero sum differential games of mean-field backward doubly stochastic differential equations,in which the coefficient contains not only the state process but also its marginal distribution,and the cost functional is also of mean-field type.It is required that the control is adapted to a sub-filtration of the filtration generated by the underlying Brownian motions.We establish a necessary condition in the form of maximum principle and a verification theorem,which is a sufficient condition for Nash equilibrium point.We use the theoretical results to deal with a partial information linear-quadratic(LQ)game,and obtain the unique Nash equilibrium point for our LQ game problem by virtue of the unique solvability of mean-field forward-backward doubly stochastic differential equation.
文摘This paper studies the policy iteration algorithm(PIA)for zero-sum stochastic differential games with the basic long-run average criterion,as well as with its more selective version,the so-called bias criterion.The system is assumed to be a nondegenerate diffusion.We use Lyapunov-like stability conditions that ensure the existence and boundedness of the solution to certain Poisson equation.We also ensure the convergence of a sequence of such solutions,of the corresponding sequence of policies,and,ultimately,of the PIA.
基金supported by the National Key R&D Program of China under Grant 2018AAA0101502.
文摘This article is the second part of Active Power Correction Strategies Based on Deep Reinforcement Learning.In Part II,we consider the renewable energy scenarios plugged into the large-scale power grid and provide an adaptive algorithmic implementation to maintain power grid stability.Based on the robustness method in Part I,a distributed deep reinforcement learning method is proposed to overcome the infuence of the increasing renewable energy penetration.A multi-agent system is implemented in multiple control areas of the power system,which conducts a fully cooperative stochastic game.Based on the Monte Carlo tree search mentioned in Part I,we select practical actions in each sub-control area to search the Nash equilibrium of the game.Based on the QMIX method,a structure of offine centralized training and online distributed execution is proposed to employ better practical actions in the active power correction control.Our proposed method is evaluated in the modified global competition scenario cases of“2020 Learning to Run a Power Network.Neurips Track 2”.
基金This work is supported in part by the National Key R&D Program of China(Grant No.2018YFA0703900)the National Natural Science Foundation of China(Grant No.11871310).
文摘Motivated by various mean-field type linear-quadratic(MF-LQ,for short)multilevel Stackelberg games,we propose a kind of multi-level self-similar randomized dominationmonotonicity structures.When the coefficients of a class of mean-field type forwardbackward stochastic differential equations(MF-FBSDEs,for short)satisfy this kind of structures,we prove the existence,the uniqueness,an estimate and the continuous dependence on the coefficients of solutions.Further,the theoretical results are applied to construct unique Stackelberg equilibria for forward and backward MF-LQ multi-level Stackelberg games,respectively.