This paper is concerned with nonzero-sum discrete-time stochastic games in Borel state and action spaces under the expected discounted payoff criterion.The payoff function can be unbounded.The transition probability i...This paper is concerned with nonzero-sum discrete-time stochastic games in Borel state and action spaces under the expected discounted payoff criterion.The payoff function can be unbounded.The transition probability is a convex combination of finite probability measures that are dominated by a probability measure on the state space and depend on the state variable.Under suitable conditions,the authors establish the existence of stationary almost Markov ε-equilibria and give an approximation method via some stochastic games with bounded payoffs.Finally,a production game is introduced to illustrate the applications of the main result,which generalizes the bounded payoff case.展开更多
In this paper we study zero-sum stochastic games. The optimality criterion is the long-run expected average criterion, and the payoff function may have neither upper nor lower bounds. We give a new set of conditions f...In this paper we study zero-sum stochastic games. The optimality criterion is the long-run expected average criterion, and the payoff function may have neither upper nor lower bounds. We give a new set of conditions for the existence of a value and a pair of optimal stationary strategies. Our conditions are slightly weaker than those in the previous literature, and some new sufficient conditions for the existence of a pair of optimal stationary strategies are imposed on the primitive data of the model. Our results are illustrated with a queueing system, for which our conditions are satisfied but some of the conditions in some previous literatures fail to hold.展开更多
To meet the demands of large-scale user access with computation-intensive and delay-sensitive applications,combining ultra-dense networks(UDNs)and mobile edge computing(MEC)are considered as important solutions.In the...To meet the demands of large-scale user access with computation-intensive and delay-sensitive applications,combining ultra-dense networks(UDNs)and mobile edge computing(MEC)are considered as important solutions.In the MEC enabled UDNs,one of the most important issues is computation offloading.Although a number of work have been done toward this issue,the problem of dynamic computation offloading in time-varying environment,especially the dynamic computation offloading problem for multi-user,has not been fully considered.Therefore,in order to fill this gap,the dynamic computation offloading problem in time-varying environment for multi-user is considered in this paper.By considering the dynamic changes of channel state and users’queue state,the dynamic computation offloading problem for multi-user is formulated as a stochastic game,which aims to optimize the delay and packet loss rate of users.To find the optimal solution of the formulated optimization problem,Nash Q-learning(NQLN)algorithm is proposed which can be quickly converged to a Nash equilibrium solution.Finally,extensive simulation results are presented to demonstrate the superiority of NQLN algorithm.It is shown that NQLN algorithm has better optimization performance than the benchmark schemes.展开更多
Existing researches on cyber attackdefense analysis have typically adopted stochastic game theory to model the problem for solutions,but the assumption of complete rationality is used in modeling,ignoring the informat...Existing researches on cyber attackdefense analysis have typically adopted stochastic game theory to model the problem for solutions,but the assumption of complete rationality is used in modeling,ignoring the information opacity in practical attack and defense scenarios,and the model and method lack accuracy.To such problem,we investigate network defense policy methods under finite rationality constraints and propose network defense policy selection algorithm based on deep reinforcement learning.Based on graph theoretical methods,we transform the decision-making problem into a path optimization problem,and use a compression method based on service node to map the network state.On this basis,we improve the A3C algorithm and design the DefenseA3C defense policy selection algorithm with online learning capability.The experimental results show that the model and method proposed in this paper can stably converge to a better network state after training,which is faster and more stable than the original A3C algorithm.Compared with the existing typical approaches,Defense-A3C is verified its advancement.展开更多
In this study, aiming at the characteristics of randomness and dynamics in Wearable Audiooriented BodyNets (WA-BodyNets), stochastic differential game theory is applied to the investigation of the problem of transm...In this study, aiming at the characteristics of randomness and dynamics in Wearable Audiooriented BodyNets (WA-BodyNets), stochastic differential game theory is applied to the investigation of the problem of transmitted power control inconsumer electronic devices. First, astochastic differential game model is proposed for non-cooperative decentralized uplink power control with a wisdom regulation factor over WA-BodyNets with a onehop star topology.This model aims to minimize the cost associated with the novel payoff function of a player, for which two cost functions are defined: functions of inherent power radiation and accumulated power radiation darmge. Second, the feedback Nash equilibrium solution of the proposed model and the constraint of the Quality of Service (QoS) requirement of the player based on the SIR threshold are derived by solving the Fleming-Bellman-Isaacs partial differential equations. Furthermore, the Markov property of the optimal feedback strategies in this model is verified.The simulation results show that the proposed game model is effective and feasible for controlling the transmitted power of WA-BodyNets.展开更多
In this paper, we deal with one kind of two-player zero-sum linear quadratic stochastic differential game problem. We give the existence of an open loop saddle point if and only if the lower and upper values exist.
Peer-to-peer computation offloading has been a promising approach that enables resourcelimited Internet of Things(IoT)devices to offload their computation-intensive tasks to idle peer devices in proximity.Different fr...Peer-to-peer computation offloading has been a promising approach that enables resourcelimited Internet of Things(IoT)devices to offload their computation-intensive tasks to idle peer devices in proximity.Different from dedicated servers,the spare computation resources offered by peer devices are random and intermittent,which affects the offloading performance.The mutual interference caused by multiple simultaneous offloading requestors that share the same wireless channel further complicates the offloading decisions.In this work,we investigate the opportunistic peer-to-peer task offloading problem by jointly considering the stochastic task arrivals,dynamic interuser interference,and opportunistic availability of peer devices.Each requestor makes decisions on both local computation frequency and offloading transmission power to minimize its own expected long-term cost on tasks completion,which takes into consideration its energy consumption,task delay,and task loss due to buffer overflow.The dynamic decision process among multiple requestors is formulated as a stochastic game.By constructing the post-decision states,a decentralized online offloading algorithm is proposed,where each requestor as an independent learning agent learns to approach the optimal strategies with its local observations.Simulation results under different system parameter configurations demonstrate the proposed online algorithm achieves a better performance compared with some existing algorithms,especially in the scenarios with large task arrival probability or small helper availability probability.展开更多
The existence and uniqueness of the solutions for one kind of forward- backward stochastic differential equations with Brownian motion and Poisson process as the noise source were given under the monotone conditions. ...The existence and uniqueness of the solutions for one kind of forward- backward stochastic differential equations with Brownian motion and Poisson process as the noise source were given under the monotone conditions. Then these results were applied to nonzero-sum differential games with random jumps to get the explicit form of the open-loop Nash equilibrium point by the solution of the forward-backward stochastic differential equations.展开更多
A necessary maximum principle is given for nonzero-sum stochastic Oltterential games with random jumps. The result is applied to solve the H2/H∞ control problem of stochastic systems with random jumps. A necessary an...A necessary maximum principle is given for nonzero-sum stochastic Oltterential games with random jumps. The result is applied to solve the H2/H∞ control problem of stochastic systems with random jumps. A necessary and sufficient condition for the existence of a unique solution to the H2/H∞ control problem is derived. The resulting solution is given by the solution of an uncontrolled forward backward stochastic differential equation with random jumps.展开更多
In this paper,the authors design a reinforcement learning algorithm to solve the adaptive linear-quadratic stochastic n-players non-zero sum differential game with completely unknown dynamics.For each player,a critic ...In this paper,the authors design a reinforcement learning algorithm to solve the adaptive linear-quadratic stochastic n-players non-zero sum differential game with completely unknown dynamics.For each player,a critic network is used to estimate the Q-function,and an actor network is used to estimate the control input.A model-free online Q-learning algorithm is obtained for solving this kind of problems.It is proved that under some mild conditions the system state and weight estimation errors can be uniformly ultimately bounded.A simulation with five players is given to verify the effectiveness of the algorithm.展开更多
We consider a model of network formation as a stochastic game with random duration proposed initially in Sun and Parilina(Autom Remote Control 82(6):1065–1082,2021).In the model,the leader first suggests a joint proj...We consider a model of network formation as a stochastic game with random duration proposed initially in Sun and Parilina(Autom Remote Control 82(6):1065–1082,2021).In the model,the leader first suggests a joint project to other players,i.e.,the network connecting them.Second,the players are allowed to form fresh links with each other updating the initially proposed network.The stage payoff of any player is defined depending on the network structure.There are two types of randomness in the network formation process:(i)links may fail to be formed with different probabilities although players intend to establish them,(ii)the game process may terminate at any stage or transit to the next stage with a certain probability distribution.Finally,a network is formed as a result of players’decisions and realization of random variables.The cooperative version of the stochastic game is investigated.In particular,we examine the properties of subgame consistency as well as strong subgame consistency of the core.We provide a payment mechanism or regularization of the core elements to sustain its subgame consistency and avoid the player’s deviations from the cooperative trajectory.In addition,the distribution procedure of the core elements is regularized in case there are negative payments to achieve only nonnegative payments to the players at any stage.The sufficient condition of a strongly subgame consistent core is also obtained.We illustrate our theoretical results with a numerical example.展开更多
Multi-agent reinforcement learning(MARL)has been a rapidly evolving field.This paper presents a comprehensive survey of MARL and its applications.We trace the historical evolution of MARL,highlight its progress,and di...Multi-agent reinforcement learning(MARL)has been a rapidly evolving field.This paper presents a comprehensive survey of MARL and its applications.We trace the historical evolution of MARL,highlight its progress,and discuss related survey works.Then,we review the existing works addressing inherent challenges and those focusing on diverse applications.Some representative stochastic games,MARL means,spatial forms of MARL,and task classification are revisited.We then conduct an in-depth exploration of a variety of challenges encountered in MARL applications.We also address critical operational aspects,such as hyperparameter tuning and computational complexity,which are pivotal in practical implementations of MARL.Afterward,we make a thorough overview of the applications of MARL to intelligent machines and devices,chemical engineering,biotechnology,healthcare,and societal issues,which highlights the extensive potential and relevance of MARL within both current and future technological contexts.Our survey also encompasses a detailed examination of benchmark environments used in MARL research,which are instrumental in evaluating MARL algorithms and demonstrate the adaptability of MARL to diverse application scenarios.In the end,we give our prospect for MARL and discuss their related techniques and potential future applications.展开更多
The strategy evolution process of game players is highly uncertain due to random emergent situations and other external disturbances.This paper investigates the issue of strategy interaction and behavioral decision-ma...The strategy evolution process of game players is highly uncertain due to random emergent situations and other external disturbances.This paper investigates the issue of strategy interaction and behavioral decision-making among game players in simulated confrontation scenarios within a random interference environment.It considers the possible risks that random disturbances may pose to the autonomous decision-making of game players,as well as the impact of participants’manipulative behaviors on the state changes of the players.A nonlinear mathematical model is established to describe the strategy decision-making process of the participants in this scenario.Subsequently,the strategy selection interaction relationship,strategy evolution stability,and dynamic decision-making process of the game players are investigated and verified by simulation experiments.The results show that maneuver-related parameters and random environmental interference factors have different effects on the selection and evolutionary speed of the agent’s strategies.Especially in a highly uncertain environment,even small information asymmetry or miscalculation may have a significant impact on decision-making.This also confirms the feasibility and effectiveness of the method proposed in the paper,which can better explain the behavioral decision-making process of the agent in the interaction process.This study provides feasibility analysis ideas and theoretical references for improving multi-agent interactive decision-making and the interpretability of the game system model.展开更多
The application of unmanned aerial vehicle(UAV)-mounted base stations is emerging as an effective solution to provide wireless communication service for a target region containing some smart objects(SOs)in internet of...The application of unmanned aerial vehicle(UAV)-mounted base stations is emerging as an effective solution to provide wireless communication service for a target region containing some smart objects(SOs)in internet of things(IoT).This paper investigates the efficient deployment problem of multiple UAVs for IoT communication in dynamic environment.We first define a measurement of communication performance of UAVto-SO in the target region which is regarded as the optimization objective.The state of one SO is active when it needs to transmit or receive the data;otherwise,silent.The switch of two different states is implemented with a certain probability that results in a dynamic communication environment.In the dynamic environment,the active states of SOs cannot be known by UAVs in advance and only neighbouring UAVs can communicate with each other.To overcome these challenges in the deployment,we leverage a game-theoretic learning approach to solve the position-selected problem.This problem is modeled a stochastic game,which is proven that it is an exact potential game and exists the best Nash equilibria(NE).Furthermore,a distributed position optimization algorithm is proposed,which can converge to a pure-strategy NE.Numerical results demonstrate the excellent performance of our proposed algorithm.展开更多
In this paper, we use the solutions of forward-backward stochastic differential equations to get the explicit form of the optimal control for linear quadratic stochastic optimal control problem and the open-loop Nash ...In this paper, we use the solutions of forward-backward stochastic differential equations to get the explicit form of the optimal control for linear quadratic stochastic optimal control problem and the open-loop Nash equilibrium point for nonzero sum differential games problem. We also discuss the solvability of the generalized Riccati equation system and give the linear feedback regulator for the optimal control problem using the solution of this kind of Riccati equation system.展开更多
Traditional evolutionary games assume uniform interaction rate, which means that the rate at which individuals meet and interact is independent of their strategies. But in some systems, especially biological systems, ...Traditional evolutionary games assume uniform interaction rate, which means that the rate at which individuals meet and interact is independent of their strategies. But in some systems, especially biological systems, the players interact with each other discriminately. Taylor and Nowak (2006) were the first to establish the corresponding non-uniform interaction rate model by allowing the interaction rates to depend on strategies. Their model is based on replicator dynamics which assumes an infinite size population. But in reality, the number of individuals in the population is always finite, and there will be some random interference in the individuals' strategy selection process. Therefore, it is more practical to establish the corresponding stochastic evolutionary model in finite populations. In fact, the analysis of evolutionary games in a finite size population is more difficult. Just as Taylor and Nowak said in the outlook section of their paper, 'The analysis of non-uniform interaction rates should be extended to stochastic game dynamics of finite populations.' In this paper, we are exactly doing this work. We extend Taylor and Nowak's model from infinite to finite case, especially focusing on the influence of non-uniform connection characteristics on the evolutionary stable state of the system. We model the strategy evolutionary process of the population by a continuous ergodic Markov process. Based on the limit distribution of the process, we can give the evolutionary stable state of the system. We make a complete classification of the symmetric 2×2 games. For each case game, the corresponding limit distribution of the Markov-based process is given when noise intensity is small enough. In contrast with most literatures in evolutionary games using the simulation method, all our results obtained are analytical. Especially, in the dominant-case game, coexistence of the two strategies may become evolutionary stable states in our model. This result can be used to explain the emergence of cooperation in the Prisoner is Dilemma Games to some extent. Some specific examples are given to illustrate our results.展开更多
We study large population stochastic dynamic games where the so-called Nash certainty equivalence based control laws are implemented by the individual players. We first show a martingale property for the limiting cont...We study large population stochastic dynamic games where the so-called Nash certainty equivalence based control laws are implemented by the individual players. We first show a martingale property for the limiting control problem of a single agent and then perform averaging across the population; this procedure leads to a constant value for the martingale which shows an invariance property of the population behavior induced by the Nash strategies.展开更多
This technical note is concerned with the maximum principle for a non-zero sum stochastic differential game with discrete and distributed delays.Not only the state variable,but also control variables of players involv...This technical note is concerned with the maximum principle for a non-zero sum stochastic differential game with discrete and distributed delays.Not only the state variable,but also control variables of players involve discrete and distributed delays.By virtue of the duality method and the generalized anticipated backward stochastic differential equations,the author establishes a necessary maximum principle and a sufficient verification theorem.To explain theoretical results,the author applies them to a dynamic advertising game problem.展开更多
This paper studies linear quadratic games problem for stochastic Volterra integral equations(SVIEs in short) where necessary and sufficient conditions for the existence of saddle points are derived in two different wa...This paper studies linear quadratic games problem for stochastic Volterra integral equations(SVIEs in short) where necessary and sufficient conditions for the existence of saddle points are derived in two different ways.As a consequence,the open problems raised by Chen and Yong(2007) are solved.To characterize the saddle points more clearly,coupled forward-backward stochastic Volterra integral equations and stochastic Fredholm-Volterra integral equations are introduced.Compared with deterministic game problems,some new terms arising from the procedure of deriving the later equations reflect well the essential nature of stochastic systems.Moreover,our representations and arguments are even new in the classical SDEs case.展开更多
This paper focuses on zero-sum stochastic differential games in the framework of forwardbackward stochastic differential equations on a finite time horizon with both players adopting impulse controls.By means of BSDE ...This paper focuses on zero-sum stochastic differential games in the framework of forwardbackward stochastic differential equations on a finite time horizon with both players adopting impulse controls.By means of BSDE methods,in particular that of the notion from Peng’s stochastic backward semigroups,the authors prove a dynamic programming principle for both the upper and the lower value functions of the game.The upper and the lower value functions are then shown to be the unique viscosity solutions of the Hamilton-Jacobi-Bellman-Isaacs equations with a double-obstacle.As a consequence,the uniqueness implies that the upper and lower value functions coincide and the game admits a value.展开更多
基金supported by the National Key Research and Development Program of China under Grant No.2022YFA1004600the National Natural Science Foundation of China under Grant No.11931018+1 种基金the Guangdong Basic and Applied Basic Research Foundation under Grant No.2021A1515010057the Guangdong Province Key Laboratory of Computational Science at the Sun Yat-sen University under Grant No.2020B1212060032。
文摘This paper is concerned with nonzero-sum discrete-time stochastic games in Borel state and action spaces under the expected discounted payoff criterion.The payoff function can be unbounded.The transition probability is a convex combination of finite probability measures that are dominated by a probability measure on the state space and depend on the state variable.Under suitable conditions,the authors establish the existence of stationary almost Markov ε-equilibria and give an approximation method via some stochastic games with bounded payoffs.Finally,a production game is introduced to illustrate the applications of the main result,which generalizes the bounded payoff case.
文摘In this paper we study zero-sum stochastic games. The optimality criterion is the long-run expected average criterion, and the payoff function may have neither upper nor lower bounds. We give a new set of conditions for the existence of a value and a pair of optimal stationary strategies. Our conditions are slightly weaker than those in the previous literature, and some new sufficient conditions for the existence of a pair of optimal stationary strategies are imposed on the primitive data of the model. Our results are illustrated with a queueing system, for which our conditions are satisfied but some of the conditions in some previous literatures fail to hold.
基金supported by the National Key Research and Development Program of China(2019YFB1804403)。
文摘To meet the demands of large-scale user access with computation-intensive and delay-sensitive applications,combining ultra-dense networks(UDNs)and mobile edge computing(MEC)are considered as important solutions.In the MEC enabled UDNs,one of the most important issues is computation offloading.Although a number of work have been done toward this issue,the problem of dynamic computation offloading in time-varying environment,especially the dynamic computation offloading problem for multi-user,has not been fully considered.Therefore,in order to fill this gap,the dynamic computation offloading problem in time-varying environment for multi-user is considered in this paper.By considering the dynamic changes of channel state and users’queue state,the dynamic computation offloading problem for multi-user is formulated as a stochastic game,which aims to optimize the delay and packet loss rate of users.To find the optimal solution of the formulated optimization problem,Nash Q-learning(NQLN)algorithm is proposed which can be quickly converged to a Nash equilibrium solution.Finally,extensive simulation results are presented to demonstrate the superiority of NQLN algorithm.It is shown that NQLN algorithm has better optimization performance than the benchmark schemes.
基金supported by the Major Science and Technology Programs in Henan Province(No.241100210100)The Project of Science and Technology in Henan Province(No.242102211068,No.232102210078)+2 种基金The Key Field Special Project of Guangdong Province(No.2021ZDZX1098)The China University Research Innovation Fund(No.2021FNB3001,No.2022IT020)Shenzhen Science and Technology Innovation Commission Stable Support Plan(No.20231128083944001)。
文摘Existing researches on cyber attackdefense analysis have typically adopted stochastic game theory to model the problem for solutions,but the assumption of complete rationality is used in modeling,ignoring the information opacity in practical attack and defense scenarios,and the model and method lack accuracy.To such problem,we investigate network defense policy methods under finite rationality constraints and propose network defense policy selection algorithm based on deep reinforcement learning.Based on graph theoretical methods,we transform the decision-making problem into a path optimization problem,and use a compression method based on service node to map the network state.On this basis,we improve the A3C algorithm and design the DefenseA3C defense policy selection algorithm with online learning capability.The experimental results show that the model and method proposed in this paper can stably converge to a better network state after training,which is faster and more stable than the original A3C algorithm.Compared with the existing typical approaches,Defense-A3C is verified its advancement.
基金the National Natural Science Foundation of China under Grants No.61272506,No.61170014,the Foundation of Key Program of MOE of China under Grant No.311007,the Natural Science Foundation of Beijing under Grant No.4102041
文摘In this study, aiming at the characteristics of randomness and dynamics in Wearable Audiooriented BodyNets (WA-BodyNets), stochastic differential game theory is applied to the investigation of the problem of transmitted power control inconsumer electronic devices. First, astochastic differential game model is proposed for non-cooperative decentralized uplink power control with a wisdom regulation factor over WA-BodyNets with a onehop star topology.This model aims to minimize the cost associated with the novel payoff function of a player, for which two cost functions are defined: functions of inherent power radiation and accumulated power radiation darmge. Second, the feedback Nash equilibrium solution of the proposed model and the constraint of the Quality of Service (QoS) requirement of the player based on the SIR threshold are derived by solving the Fleming-Bellman-Isaacs partial differential equations. Furthermore, the Markov property of the optimal feedback strategies in this model is verified.The simulation results show that the proposed game model is effective and feasible for controlling the transmitted power of WA-BodyNets.
基金The Young Research Foundation(201201130) of Jilin Provincial Science&Technology DepartmentResearch Foundation(2011LG17) of Changchun University of Technology
文摘In this paper, we deal with one kind of two-player zero-sum linear quadratic stochastic differential game problem. We give the existence of an open loop saddle point if and only if the lower and upper values exist.
基金supported by National Natural Science Foundation of China (No. 62101601)
文摘Peer-to-peer computation offloading has been a promising approach that enables resourcelimited Internet of Things(IoT)devices to offload their computation-intensive tasks to idle peer devices in proximity.Different from dedicated servers,the spare computation resources offered by peer devices are random and intermittent,which affects the offloading performance.The mutual interference caused by multiple simultaneous offloading requestors that share the same wireless channel further complicates the offloading decisions.In this work,we investigate the opportunistic peer-to-peer task offloading problem by jointly considering the stochastic task arrivals,dynamic interuser interference,and opportunistic availability of peer devices.Each requestor makes decisions on both local computation frequency and offloading transmission power to minimize its own expected long-term cost on tasks completion,which takes into consideration its energy consumption,task delay,and task loss due to buffer overflow.The dynamic decision process among multiple requestors is formulated as a stochastic game.By constructing the post-decision states,a decentralized online offloading algorithm is proposed,where each requestor as an independent learning agent learns to approach the optimal strategies with its local observations.Simulation results under different system parameter configurations demonstrate the proposed online algorithm achieves a better performance compared with some existing algorithms,especially in the scenarios with large task arrival probability or small helper availability probability.
基金国家自然科学基金,Outstanding Young Teachers of Ministry of Education of China,Special Fund for Ph.D.Program of Ministry of Education of China,Fok Ying Tung Education Foundation
文摘The existence and uniqueness of the solutions for one kind of forward- backward stochastic differential equations with Brownian motion and Poisson process as the noise source were given under the monotone conditions. Then these results were applied to nonzero-sum differential games with random jumps to get the explicit form of the open-loop Nash equilibrium point by the solution of the forward-backward stochastic differential equations.
基金supported by the Doctoral foundation of University of Jinan(XBS1213)the National Natural Science Foundation of China(11101242)
文摘A necessary maximum principle is given for nonzero-sum stochastic Oltterential games with random jumps. The result is applied to solve the H2/H∞ control problem of stochastic systems with random jumps. A necessary and sufficient condition for the existence of a unique solution to the H2/H∞ control problem is derived. The resulting solution is given by the solution of an uncontrolled forward backward stochastic differential equation with random jumps.
基金supported in part by the National Natural Science Foundation of China under Grant Nos.62122043,62192753in part by Natural Science Foundation of Shandong Province for Distinguished Young Scholars under Grant No.ZR2022JQ31in part by the Innovative Research Groups of the National Natural Science Foundation of China under Grant No.61821004.
文摘In this paper,the authors design a reinforcement learning algorithm to solve the adaptive linear-quadratic stochastic n-players non-zero sum differential game with completely unknown dynamics.For each player,a critic network is used to estimate the Q-function,and an actor network is used to estimate the control input.A model-free online Q-learning algorithm is obtained for solving this kind of problems.It is proved that under some mild conditions the system state and weight estimation errors can be uniformly ultimately bounded.A simulation with five players is given to verify the effectiveness of the algorithm.
基金supported by the Russian Science Foundation(No.22-21-00346)。
文摘We consider a model of network formation as a stochastic game with random duration proposed initially in Sun and Parilina(Autom Remote Control 82(6):1065–1082,2021).In the model,the leader first suggests a joint project to other players,i.e.,the network connecting them.Second,the players are allowed to form fresh links with each other updating the initially proposed network.The stage payoff of any player is defined depending on the network structure.There are two types of randomness in the network formation process:(i)links may fail to be formed with different probabilities although players intend to establish them,(ii)the game process may terminate at any stage or transit to the next stage with a certain probability distribution.Finally,a network is formed as a result of players’decisions and realization of random variables.The cooperative version of the stochastic game is investigated.In particular,we examine the properties of subgame consistency as well as strong subgame consistency of the core.We provide a payment mechanism or regularization of the core elements to sustain its subgame consistency and avoid the player’s deviations from the cooperative trajectory.In addition,the distribution procedure of the core elements is regularized in case there are negative payments to achieve only nonnegative payments to the players at any stage.The sufficient condition of a strongly subgame consistent core is also obtained.We illustrate our theoretical results with a numerical example.
基金Ministry of Education,Singapore,under AcRF TIER 1 Grant RG64/23the Eric and Wendy Schmidt AI in Science Postdoctoral Fellowship,a Schmidt Futures program,USA.
文摘Multi-agent reinforcement learning(MARL)has been a rapidly evolving field.This paper presents a comprehensive survey of MARL and its applications.We trace the historical evolution of MARL,highlight its progress,and discuss related survey works.Then,we review the existing works addressing inherent challenges and those focusing on diverse applications.Some representative stochastic games,MARL means,spatial forms of MARL,and task classification are revisited.We then conduct an in-depth exploration of a variety of challenges encountered in MARL applications.We also address critical operational aspects,such as hyperparameter tuning and computational complexity,which are pivotal in practical implementations of MARL.Afterward,we make a thorough overview of the applications of MARL to intelligent machines and devices,chemical engineering,biotechnology,healthcare,and societal issues,which highlights the extensive potential and relevance of MARL within both current and future technological contexts.Our survey also encompasses a detailed examination of benchmark environments used in MARL research,which are instrumental in evaluating MARL algorithms and demonstrate the adaptability of MARL to diverse application scenarios.In the end,we give our prospect for MARL and discuss their related techniques and potential future applications.
文摘The strategy evolution process of game players is highly uncertain due to random emergent situations and other external disturbances.This paper investigates the issue of strategy interaction and behavioral decision-making among game players in simulated confrontation scenarios within a random interference environment.It considers the possible risks that random disturbances may pose to the autonomous decision-making of game players,as well as the impact of participants’manipulative behaviors on the state changes of the players.A nonlinear mathematical model is established to describe the strategy decision-making process of the participants in this scenario.Subsequently,the strategy selection interaction relationship,strategy evolution stability,and dynamic decision-making process of the game players are investigated and verified by simulation experiments.The results show that maneuver-related parameters and random environmental interference factors have different effects on the selection and evolutionary speed of the agent’s strategies.Especially in a highly uncertain environment,even small information asymmetry or miscalculation may have a significant impact on decision-making.This also confirms the feasibility and effectiveness of the method proposed in the paper,which can better explain the behavioral decision-making process of the agent in the interaction process.This study provides feasibility analysis ideas and theoretical references for improving multi-agent interactive decision-making and the interpretability of the game system model.
基金supported in part by the Natural Science Foundation of China under Grants 61801243, 61671144, and 61971238by the China Postdoctoral Science Foundation under Grant 2019M651914+1 种基金by the Natural Science Foundation of the Jiangsu Higher Education Institutions of China under Grant 18KJB510026by the Foundation of Nanjing University of Posts and Telecommunications under Grant NY218124
文摘The application of unmanned aerial vehicle(UAV)-mounted base stations is emerging as an effective solution to provide wireless communication service for a target region containing some smart objects(SOs)in internet of things(IoT).This paper investigates the efficient deployment problem of multiple UAVs for IoT communication in dynamic environment.We first define a measurement of communication performance of UAVto-SO in the target region which is regarded as the optimization objective.The state of one SO is active when it needs to transmit or receive the data;otherwise,silent.The switch of two different states is implemented with a certain probability that results in a dynamic communication environment.In the dynamic environment,the active states of SOs cannot be known by UAVs in advance and only neighbouring UAVs can communicate with each other.To overcome these challenges in the deployment,we leverage a game-theoretic learning approach to solve the position-selected problem.This problem is modeled a stochastic game,which is proven that it is an exact potential game and exists the best Nash equilibria(NE).Furthermore,a distributed position optimization algorithm is proposed,which can converge to a pure-strategy NE.Numerical results demonstrate the excellent performance of our proposed algorithm.
基金This work is supported by the National Natural Science Foundation (Grant No.10371067)the Youth Teacher Foundation of Fok Ying Tung Education Foundation, the Excellent Young Teachers Program and the Doctoral Program Foundation of MOE and Shandong Province, China.
文摘In this paper, we use the solutions of forward-backward stochastic differential equations to get the explicit form of the optimal control for linear quadratic stochastic optimal control problem and the open-loop Nash equilibrium point for nonzero sum differential games problem. We also discuss the solvability of the generalized Riccati equation system and give the linear feedback regulator for the optimal control problem using the solution of this kind of Riccati equation system.
基金Supported by the National Natural Science Foundation of China under Grant Nos. 71231007, 71071119, and 60574071
文摘Traditional evolutionary games assume uniform interaction rate, which means that the rate at which individuals meet and interact is independent of their strategies. But in some systems, especially biological systems, the players interact with each other discriminately. Taylor and Nowak (2006) were the first to establish the corresponding non-uniform interaction rate model by allowing the interaction rates to depend on strategies. Their model is based on replicator dynamics which assumes an infinite size population. But in reality, the number of individuals in the population is always finite, and there will be some random interference in the individuals' strategy selection process. Therefore, it is more practical to establish the corresponding stochastic evolutionary model in finite populations. In fact, the analysis of evolutionary games in a finite size population is more difficult. Just as Taylor and Nowak said in the outlook section of their paper, 'The analysis of non-uniform interaction rates should be extended to stochastic game dynamics of finite populations.' In this paper, we are exactly doing this work. We extend Taylor and Nowak's model from infinite to finite case, especially focusing on the influence of non-uniform connection characteristics on the evolutionary stable state of the system. We model the strategy evolutionary process of the population by a continuous ergodic Markov process. Based on the limit distribution of the process, we can give the evolutionary stable state of the system. We make a complete classification of the symmetric 2×2 games. For each case game, the corresponding limit distribution of the Markov-based process is given when noise intensity is small enough. In contrast with most literatures in evolutionary games using the simulation method, all our results obtained are analytical. Especially, in the dominant-case game, coexistence of the two strategies may become evolutionary stable states in our model. This result can be used to explain the emergence of cooperation in the Prisoner is Dilemma Games to some extent. Some specific examples are given to illustrate our results.
文摘We study large population stochastic dynamic games where the so-called Nash certainty equivalence based control laws are implemented by the individual players. We first show a martingale property for the limiting control problem of a single agent and then perform averaging across the population; this procedure leads to a constant value for the martingale which shows an invariance property of the population behavior induced by the Nash strategies.
基金the National Natural Science Foundation of China under Grant No.11701214Shandong Provincial Natural Science FoundationChina under Grant No.ZR2019MA045。
文摘This technical note is concerned with the maximum principle for a non-zero sum stochastic differential game with discrete and distributed delays.Not only the state variable,but also control variables of players involve discrete and distributed delays.By virtue of the duality method and the generalized anticipated backward stochastic differential equations,the author establishes a necessary maximum principle and a sufficient verification theorem.To explain theoretical results,the author applies them to a dynamic advertising game problem.
基金supported by National Basic Research Program of China(973 Program)(Grant No.2011CB808002)National Natural Science Foundation of China(Grant Nos.11231007,11301298,11471231,11401404,11371226,11071145 and 11231005)+2 种基金China Postdoctoral Science Foundation(Grant No.2014M562321)Foundation for Innovative Research Groups of National Natural Science Foundation of China(Grant No.11221061)the Program for Introducing Talents of Discipline to Universities(the National 111Project of China's Higher Education)(Grant No.B12023)
文摘This paper studies linear quadratic games problem for stochastic Volterra integral equations(SVIEs in short) where necessary and sufficient conditions for the existence of saddle points are derived in two different ways.As a consequence,the open problems raised by Chen and Yong(2007) are solved.To characterize the saddle points more clearly,coupled forward-backward stochastic Volterra integral equations and stochastic Fredholm-Volterra integral equations are introduced.Compared with deterministic game problems,some new terms arising from the procedure of deriving the later equations reflect well the essential nature of stochastic systems.Moreover,our representations and arguments are even new in the classical SDEs case.
基金supported by the National Nature Science Foundation of China under Grant Nos.11701040,11871010,61871058the Fundamental Research Funds for the Central Universities under Grant No.2019XDA11。
文摘This paper focuses on zero-sum stochastic differential games in the framework of forwardbackward stochastic differential equations on a finite time horizon with both players adopting impulse controls.By means of BSDE methods,in particular that of the notion from Peng’s stochastic backward semigroups,the authors prove a dynamic programming principle for both the upper and the lower value functions of the game.The upper and the lower value functions are then shown to be the unique viscosity solutions of the Hamilton-Jacobi-Bellman-Isaacs equations with a double-obstacle.As a consequence,the uniqueness implies that the upper and lower value functions coincide and the game admits a value.