The aim of this paper is to reveal the mechanism of compromise and change in coordination where players agree in general but disagree on coordination methods. When players agree on the need to collaborate but are in c...The aim of this paper is to reveal the mechanism of compromise and change in coordination where players agree in general but disagree on coordination methods. When players agree on the need to collaborate but are in conflict regarding the specific method, one player must always compromise. This situation is known as the Battle of the Sexes in game theory. It has ever been believed that if an agreement is reached under such circumstances, the players do not have the incentive to withdraw from the agreement. However, this study shows that this belief is not always true if the players were able to revise the outcome of their negotiations later. The wide-ranging fields use game theories for their analysis frameworks to analyze the success or failure of coordination. However, comparing with the possibility of betrayal illustrated as a well-known Prisoner Dilemma, it has been rare to discuss conflict regarding the specific method of coordination, although such situations are often observed in today's interdependent real world. The repeated Battle of the Sexes games presented in this study would be a useful framework to analyze conflict regarding the specific method of coordination.展开更多
Considering the dynamic character of repeated games and Markov process, this paper presented a novel dynamic decision model for symmetric repeated games. In this model, players' actions were mapped to a Markov decisi...Considering the dynamic character of repeated games and Markov process, this paper presented a novel dynamic decision model for symmetric repeated games. In this model, players' actions were mapped to a Markov decision process with payoffs, and the Boltzmann distribution was intousluced. Our dynamic model is different from others' , we used this dynamic model to study the iterated prisoner' s dilemma, and the results show that this decision model can successfully be used in symmetric repeated games and has an ability of adaptive learning.展开更多
In this paper, we characterize the players’ behavior in the stock market by the repeated game model with asymmetric information. We show that the discount price process of stock is a martingale driven by Brownian mot...In this paper, we characterize the players’ behavior in the stock market by the repeated game model with asymmetric information. We show that the discount price process of stock is a martingale driven by Brownian motion, and give an endogenous explanation for the random fluctuation of stock price: the randomizations in the market is due to the randomizations in the strategy of the informed player which hopes to avoid revealing his private information. On this basis, through studying the corresponding option pricing problem furtherly, we can give the expression of function<em> φ</em>.展开更多
This paper focuses on the performance of equalizer zero-determinant(ZD)strategies in discounted repeated Stackelberg asymmetric games.In the leader-follower adversarial scenario,the strong Stackelberg equilibrium(SSE)...This paper focuses on the performance of equalizer zero-determinant(ZD)strategies in discounted repeated Stackelberg asymmetric games.In the leader-follower adversarial scenario,the strong Stackelberg equilibrium(SSE)deriving from the opponents’best response(BR),is technically the optimal strategy for the leader.However,computing an SSE strategy may be difficult since it needs to solve a mixed-integer program and has exponential complexity in the number of states.To this end,the authors propose an equalizer ZD strategy,which can unilaterally restrict the opponent’s expected utility.The authors first study the existence of an equalizer ZD strategy with one-to-one situations,and analyze an upper bound of its performance with the baseline SSE strategy.Then the authors turn to multi-player models,where there exists one player adopting an equalizer ZD strategy.The authors give bounds of the weighted sum of opponents’s utilities,and compare it with the SSE strategy.Finally,the authors give simulations on unmanned aerial vehicles(UAVs)and the moving target defense(MTD)to verify the effectiveness of the proposed approach.展开更多
Utilized fundamental theory and analysis method of Incomplete Information repeated games, introduced Incomplete Information into repeated games, and established two stages dynamic games model of the local authority an...Utilized fundamental theory and analysis method of Incomplete Information repeated games, introduced Incomplete Information into repeated games, and established two stages dynamic games model of the local authority and the coal mine owner. The analytic result indicates that: so long as the country established the corresponding rewards and punishments incentive mechanism to the local authority departments responsible for the work, it reports the safety accident in the coal mine on time. The conclusion that the local government displays right and wrong cooperation behavior will be changed with the introduction of the Incomplete Information. Only has the local authority fulfill their responsibility, can the unsafe accident be controlled effectively. Once this kind of cooperation of local government appears, the costs of the country on the safe supervise and the difficulty will be able to decrease greatly.展开更多
Repeated games describe situations where players interact with each other in a dynamic pattern and make decisions ac- cording to outcomes of previous stage games. Very recently, Press and Dyson have revealed a new cla...Repeated games describe situations where players interact with each other in a dynamic pattern and make decisions ac- cording to outcomes of previous stage games. Very recently, Press and Dyson have revealed a new class of zero-determinant (ZD) strategies for the repeated games, which can enforce a fixed linear relationship between expected payoffs of two play- ers, indicating that a smart player can control her unwitting co-player's payoff in a unilateral way [Proc. Acad. Natl. Sci. USA 109, 10409 (2012)]. The theory of ZD strategies provides a novel viewpoint to depict interactions among players, and fundamentally changes the research paradigm of game theory. In this brief survey, we first introduce the mathematical framework of ZD strategies, and review the properties and constrains of two specifications of ZD strategies, called pinning strategies and extortion strategies. Then we review some representative research progresses, including robustness analysis, cooperative ZD strategy analysis, and evolutionary stability analysis. Finally, we discuss some significant extensions to ZD strategies, including the multi-player ZD strategies, and ZD strategies under noise. Challenges in related research fields are also listed.展开更多
Cognitive radio(CR) is a promising solution to improve the spectrum utilization.The cognitive radio networks includes the primary user(PU) system with authorized spectrum and the secondly user(SU) system without autho...Cognitive radio(CR) is a promising solution to improve the spectrum utilization.The cognitive radio networks includes the primary user(PU) system with authorized spectrum and the secondly user(SU) system without authorized spectrum. When the SUs want to use the spectrum, they have to find the idle channels that are not occupied by the PUs. So the QoS of the SUs will be affected not only by the characteristic of their own business, but also by the behavior of the PUs.Currently, in order to ensure the quality of the SU services, the M-LDWF algorithm is widely used in scheduling. However, the M-LWDF algorithm didn't fully consider the difference among the SUs. For those SUs who are in the process of communication but have to change channel due to the return of the PU, they should have higher scheduling priority. In this paper, we put forward an improved algorithm based on M-LWDF. In order to guarantee the QoS of the SUs those were in the processing of communication, we gave the higher scheduling priority. Simulation results show that the improved algorithm can effectively decrease the dropping rate and improve the QoS of the SUs and the performance of the whole system.展开更多
This paper proposes a negotiation-based TDMA scheme for ad hoc networks, which was modeled as an asynchronous myopic repeated game. Compared to the traditional centralized TDMA schemes, our scheme operates in a decent...This paper proposes a negotiation-based TDMA scheme for ad hoc networks, which was modeled as an asynchronous myopic repeated game. Compared to the traditional centralized TDMA schemes, our scheme operates in a decentralized manner and is scalable to topology changes. Simulation results show that, with respect to the coloring quality, the performance of our scheme is close to that of the classical centralized algorithms with much lower complexity.展开更多
Focusing on dropping packets attacks in sensor networks, we propose a model of dropping packets attack-resistance as a repeated game based on such an assumption that sensor nodes are rational. The model prevents malic...Focusing on dropping packets attacks in sensor networks, we propose a model of dropping packets attack-resistance as a repeated game based on such an assumption that sensor nodes are rational. The model prevents malicious nodes from attacking by establishing punishment mechanism, and impels sensor networks to reach a collaborative Nash equilibrium. Simulation results show that the devised model can effectively resist the dropping packets attacks(DPA) by choosing reasonable configuration parameters.展开更多
This paper investigates infinite horizon repeated security games with one defender and multiple attacker types.The incomplete information brings uncertainty of attackers’behaviour for the defender.Under the uncertain...This paper investigates infinite horizon repeated security games with one defender and multiple attacker types.The incomplete information brings uncertainty of attackers’behaviour for the defender.Under the uncertainty of attackers’behaviours,we take the worst-case analysis to minimise the defender’s regret w.r.t.each attacker type.We wish to keep the regret especially small w.r.t.one attacker type,at the cost of modest additional overhead compared to others.The tradeoff among the objectives requires us to build a Multi-Objective Repeated SecurityGame(MORSG)model.To parameterise the regret Pareto frontier,we combine the different weight vectors with different objectives and build a linear programming approach.By running the Q-iteration procedure on linear programming for each weight vector,the optimal regret Pareto frontier can be computed.We also propose an approximate approach to approximate it.The approximation analysis proves the effectiveness of the approximation approach.展开更多
In this paper,we consider to learn the inherent probability distribution of types via knowledge transfer in a two-player repeated Bayesian game,which is a basic model in network security.In the Bayesian game,the attac...In this paper,we consider to learn the inherent probability distribution of types via knowledge transfer in a two-player repeated Bayesian game,which is a basic model in network security.In the Bayesian game,the attacker's distribution of types is unknown by the defender and the defender aims to reconstruct the distribution with historical actions.lt is dificult to calculate the distribution of types directly since the distribution is coupled with a prediction function of the attacker in the game model.Thus,we seek help from an interrelated complete-information game,based on the idea of transfer learning.We provide two different methods to estimate the prediction function in difftrent concrete conditions with knowledge transfer.After obtaining the estimated prediction function,the deiender can decouple the inherent distribution and the prediction function in the Bayesian game,and moreover,reconstruct the distribution of the attacker's types.Finally,we give numerical examples to illustrate the effectiveness of our methods.展开更多
The power market is a typical imperfectly competitive market where power suppliers gain higher profits through strategic bidding behaviors.Most existing studies assume that a power supplier is accessible to the suffic...The power market is a typical imperfectly competitive market where power suppliers gain higher profits through strategic bidding behaviors.Most existing studies assume that a power supplier is accessible to the sufficient market information to derive an optimal bidding strategy.However,this assumption may not be true in reality,particularly when a power market is newly launched.To help power suppliers bid with the limited information,a modified continuous action reinforcement learning automata algorithm is proposed.This algorithm introduces the discretization and Dyna structure into continuous action reinforcement learning automata algorithm for easy implementation in a repeated game.Simulation results verify the effectiveness of the proposed learning algorithm.展开更多
In vehicle Ad-hoc netwok (VANET), traffic load is often unevenly distributed among access points (APs). Such load imbalance hampers the network from fully utilizing the network capacity. To alleviate such imbalanc...In vehicle Ad-hoc netwok (VANET), traffic load is often unevenly distributed among access points (APs). Such load imbalance hampers the network from fully utilizing the network capacity. To alleviate such imbalance, the paper introduces a novel pricing game model. The research scene is at the intersection when the traffic light is green. As vehicles are highly mobile and the network typology changes dynamically, the paper divides the green light time into equal slots and calculates APs' prices with the presented pricing game in each time slot. The whole process is a repeated game model. The final equilibrium solution set is APs' pricing strategy, and the paper claim that this equilibrium solution set can affect vehicles' selection and ensure APs' load-balancing. Simulation results based on a realistic vehicular traffic model demonstrate the effectiveness of the game method.展开更多
文摘The aim of this paper is to reveal the mechanism of compromise and change in coordination where players agree in general but disagree on coordination methods. When players agree on the need to collaborate but are in conflict regarding the specific method, one player must always compromise. This situation is known as the Battle of the Sexes in game theory. It has ever been believed that if an agreement is reached under such circumstances, the players do not have the incentive to withdraw from the agreement. However, this study shows that this belief is not always true if the players were able to revise the outcome of their negotiations later. The wide-ranging fields use game theories for their analysis frameworks to analyze the success or failure of coordination. However, comparing with the possibility of betrayal illustrated as a well-known Prisoner Dilemma, it has been rare to discuss conflict regarding the specific method of coordination, although such situations are often observed in today's interdependent real world. The repeated Battle of the Sexes games presented in this study would be a useful framework to analyze conflict regarding the specific method of coordination.
基金We also acknowledge the support by the National Natural Science Foundation of China (Grant No. 60574071).
文摘Considering the dynamic character of repeated games and Markov process, this paper presented a novel dynamic decision model for symmetric repeated games. In this model, players' actions were mapped to a Markov decision process with payoffs, and the Boltzmann distribution was intousluced. Our dynamic model is different from others' , we used this dynamic model to study the iterated prisoner' s dilemma, and the results show that this decision model can successfully be used in symmetric repeated games and has an ability of adaptive learning.
文摘In this paper, we characterize the players’ behavior in the stock market by the repeated game model with asymmetric information. We show that the discount price process of stock is a martingale driven by Brownian motion, and give an endogenous explanation for the random fluctuation of stock price: the randomizations in the market is due to the randomizations in the strategy of the informed player which hopes to avoid revealing his private information. On this basis, through studying the corresponding option pricing problem furtherly, we can give the expression of function<em> φ</em>.
基金supported by the National Key Research and Development Program of China under Grant No.2022YFA1004700the National Natural Science Foundation of China under Grant No.62173250Shanghai Municipal Science and Technology Major Project under Grant No.2021SHZDZX0100.
文摘This paper focuses on the performance of equalizer zero-determinant(ZD)strategies in discounted repeated Stackelberg asymmetric games.In the leader-follower adversarial scenario,the strong Stackelberg equilibrium(SSE)deriving from the opponents’best response(BR),is technically the optimal strategy for the leader.However,computing an SSE strategy may be difficult since it needs to solve a mixed-integer program and has exponential complexity in the number of states.To this end,the authors propose an equalizer ZD strategy,which can unilaterally restrict the opponent’s expected utility.The authors first study the existence of an equalizer ZD strategy with one-to-one situations,and analyze an upper bound of its performance with the baseline SSE strategy.Then the authors turn to multi-player models,where there exists one player adopting an equalizer ZD strategy.The authors give bounds of the weighted sum of opponents’s utilities,and compare it with the SSE strategy.Finally,the authors give simulations on unmanned aerial vehicles(UAVs)and the moving target defense(MTD)to verify the effectiveness of the proposed approach.
文摘Utilized fundamental theory and analysis method of Incomplete Information repeated games, introduced Incomplete Information into repeated games, and established two stages dynamic games model of the local authority and the coal mine owner. The analytic result indicates that: so long as the country established the corresponding rewards and punishments incentive mechanism to the local authority departments responsible for the work, it reports the safety accident in the coal mine on time. The conclusion that the local government displays right and wrong cooperation behavior will be changed with the introduction of the Incomplete Information. Only has the local authority fulfill their responsibility, can the unsafe accident be controlled effectively. Once this kind of cooperation of local government appears, the costs of the country on the safe supervise and the difficulty will be able to decrease greatly.
基金supported by the National Natural Science Foundation of China(Grant Nos.61004098 and 11222543)the Program for New Century Excellent Talentsin Universities of China(Grant No.NCET-11-0070)+2 种基金the Special Project of Youth Science and Technology Innovation Research Team of Sichuan ProvinceChina(Grant No.2013TD0006)the Research Foundation of UESTC and Scholars Program of Hong Kong(Grant No.G-YZ4D)
文摘Repeated games describe situations where players interact with each other in a dynamic pattern and make decisions ac- cording to outcomes of previous stage games. Very recently, Press and Dyson have revealed a new class of zero-determinant (ZD) strategies for the repeated games, which can enforce a fixed linear relationship between expected payoffs of two play- ers, indicating that a smart player can control her unwitting co-player's payoff in a unilateral way [Proc. Acad. Natl. Sci. USA 109, 10409 (2012)]. The theory of ZD strategies provides a novel viewpoint to depict interactions among players, and fundamentally changes the research paradigm of game theory. In this brief survey, we first introduce the mathematical framework of ZD strategies, and review the properties and constrains of two specifications of ZD strategies, called pinning strategies and extortion strategies. Then we review some representative research progresses, including robustness analysis, cooperative ZD strategy analysis, and evolutionary stability analysis. Finally, we discuss some significant extensions to ZD strategies, including the multi-player ZD strategies, and ZD strategies under noise. Challenges in related research fields are also listed.
基金supported by Beijing Key Laboratory of Work Safety Intelligent Monitoring (Beijing University of Posts and Telecommunications)
文摘Cognitive radio(CR) is a promising solution to improve the spectrum utilization.The cognitive radio networks includes the primary user(PU) system with authorized spectrum and the secondly user(SU) system without authorized spectrum. When the SUs want to use the spectrum, they have to find the idle channels that are not occupied by the PUs. So the QoS of the SUs will be affected not only by the characteristic of their own business, but also by the behavior of the PUs.Currently, in order to ensure the quality of the SU services, the M-LDWF algorithm is widely used in scheduling. However, the M-LWDF algorithm didn't fully consider the difference among the SUs. For those SUs who are in the process of communication but have to change channel due to the return of the PU, they should have higher scheduling priority. In this paper, we put forward an improved algorithm based on M-LWDF. In order to guarantee the QoS of the SUs those were in the processing of communication, we gave the higher scheduling priority. Simulation results show that the improved algorithm can effectively decrease the dropping rate and improve the QoS of the SUs and the performance of the whole system.
基金supported in part by National Science Fund for Distinguished Young Scholars under Grant No.60725105National Key Basic Research Program of China ( 973 Program ) under Grant No.2009CB320404+2 种基金Program for Changjiang Scholars and Innovative Research Team in University under Grant No.IRT0852National Natural Science Foundation of China under Grants No.60972047, 61072068111 Project under Grant No.B08038
文摘This paper proposes a negotiation-based TDMA scheme for ad hoc networks, which was modeled as an asynchronous myopic repeated game. Compared to the traditional centralized TDMA schemes, our scheme operates in a decentralized manner and is scalable to topology changes. Simulation results show that, with respect to the coloring quality, the performance of our scheme is close to that of the classical centralized algorithms with much lower complexity.
基金the National Defense Basic Research Foun-dation of China (C2720061361)
文摘Focusing on dropping packets attacks in sensor networks, we propose a model of dropping packets attack-resistance as a repeated game based on such an assumption that sensor nodes are rational. The model prevents malicious nodes from attacking by establishing punishment mechanism, and impels sensor networks to reach a collaborative Nash equilibrium. Simulation results show that the devised model can effectively resist the dropping packets attacks(DPA) by choosing reasonable configuration parameters.
基金The paper is supported by theNationalNatural Science Foundation of China[grant nos 61572095,61877007].
文摘This paper investigates infinite horizon repeated security games with one defender and multiple attacker types.The incomplete information brings uncertainty of attackers’behaviour for the defender.Under the uncertainty of attackers’behaviours,we take the worst-case analysis to minimise the defender’s regret w.r.t.each attacker type.We wish to keep the regret especially small w.r.t.one attacker type,at the cost of modest additional overhead compared to others.The tradeoff among the objectives requires us to build a Multi-Objective Repeated SecurityGame(MORSG)model.To parameterise the regret Pareto frontier,we combine the different weight vectors with different objectives and build a linear programming approach.By running the Q-iteration procedure on linear programming for each weight vector,the optimal regret Pareto frontier can be computed.We also propose an approximate approach to approximate it.The approximation analysis proves the effectiveness of the approximation approach.
基金This work was supported by the National Key Research and Development Program(No.2016YFB0901900)the National Natural Science Foundation of China(No.61733018)The authors would like to thank Prof.Peng Yi for his helpful suggestions.
文摘In this paper,we consider to learn the inherent probability distribution of types via knowledge transfer in a two-player repeated Bayesian game,which is a basic model in network security.In the Bayesian game,the attacker's distribution of types is unknown by the defender and the defender aims to reconstruct the distribution with historical actions.lt is dificult to calculate the distribution of types directly since the distribution is coupled with a prediction function of the attacker in the game model.Thus,we seek help from an interrelated complete-information game,based on the idea of transfer learning.We provide two different methods to estimate the prediction function in difftrent concrete conditions with knowledge transfer.After obtaining the estimated prediction function,the deiender can decouple the inherent distribution and the prediction function in the Bayesian game,and moreover,reconstruct the distribution of the attacker's types.Finally,we give numerical examples to illustrate the effectiveness of our methods.
基金This work was supported by the National Natural Science Foundation of China(No.U1866206).
文摘The power market is a typical imperfectly competitive market where power suppliers gain higher profits through strategic bidding behaviors.Most existing studies assume that a power supplier is accessible to the sufficient market information to derive an optimal bidding strategy.However,this assumption may not be true in reality,particularly when a power market is newly launched.To help power suppliers bid with the limited information,a modified continuous action reinforcement learning automata algorithm is proposed.This algorithm introduces the discretization and Dyna structure into continuous action reinforcement learning automata algorithm for easy implementation in a repeated game.Simulation results verify the effectiveness of the proposed learning algorithm.
基金supported by the Open Research Fund from the Key Laboratory for Computer Network and Information Integration (Southeast University, Ministry of Education, China)the Fundamental Research Funds for the Central Universities+4 种基金National Key Technology R&D Program (2011BAK02B02-01),National Key Technology R&D Program of China (2011BAK02B02)the Hi-Tech Research and Development Program of China (2012AA111902)State Key Development Program for Basic Research of China (2011CB302902)the National Natural Science Foundation of China (61073180)National Science and Technology Major Project (2010ZX03006-002-03)
文摘In vehicle Ad-hoc netwok (VANET), traffic load is often unevenly distributed among access points (APs). Such load imbalance hampers the network from fully utilizing the network capacity. To alleviate such imbalance, the paper introduces a novel pricing game model. The research scene is at the intersection when the traffic light is green. As vehicles are highly mobile and the network typology changes dynamically, the paper divides the green light time into equal slots and calculates APs' prices with the presented pricing game in each time slot. The whole process is a repeated game model. The final equilibrium solution set is APs' pricing strategy, and the paper claim that this equilibrium solution set can affect vehicles' selection and ensure APs' load-balancing. Simulation results based on a realistic vehicular traffic model demonstrate the effectiveness of the game method.