This paper presents a novel cooperative value iteration(VI)-based adaptive dynamic programming method for multi-player differential game models with a convergence proof.The players are divided into two groups in the l...This paper presents a novel cooperative value iteration(VI)-based adaptive dynamic programming method for multi-player differential game models with a convergence proof.The players are divided into two groups in the learning process and adapt their policies sequentially.Our method removes the dependence of admissible initial policies,which is one of the main drawbacks of the PI-based frameworks.Furthermore,this algorithm enables the players to adapt their control policies without full knowledge of others’ system parameters or control laws.The efficacy of our method is illustrated by three examples.展开更多
Cooperative autonomous air combat of multiple unmanned aerial vehicles(UAVs)is one of the main combat modes in future air warfare,which becomes even more complicated with highly changeable situation and uncertain info...Cooperative autonomous air combat of multiple unmanned aerial vehicles(UAVs)is one of the main combat modes in future air warfare,which becomes even more complicated with highly changeable situation and uncertain information of the opponents.As such,this paper presents a cooperative decision-making method based on incomplete information dynamic game to generate maneuver strategies for multiple UAVs in air combat.Firstly,a cooperative situation assessment model is presented to measure the overall combat situation.Secondly,an incomplete information dynamic game model is proposed to model the dynamic process of air combat,and a dynamic Bayesian network is designed to infer the tactical intention of the opponent.Then a reinforcement learning framework based on multiagent deep deterministic policy gradient is established to obtain the perfect Bayes-Nash equilibrium solution of the air combat game model.Finally,a series of simulations are conducted to verify the effectiveness of the proposed method,and the simulation results show effective synergies and cooperative tactics.展开更多
At present, the studies on multi-team antagonistic games(MTAGs) are still in the early stage, because this complicated problem involves not only incompleteness of information and conflict of interests, but also select...At present, the studies on multi-team antagonistic games(MTAGs) are still in the early stage, because this complicated problem involves not only incompleteness of information and conflict of interests, but also selection of antagonistic targets.Therefore, based on the previous researches, a new framework is proposed in this paper, which is dynamic multi-team antagonistic games with incomplete information(DMTAGII) model.For this model, the corresponding concept of perfect Bayesian Nash equilibrium(PBNE) is established and the existence of PBNE is also proved. Besides, an interactive iteration algorithm is introduced according to the idea of the best response for solving the equilibrium. Then, the scenario of multiple unmanned aerial vehicles(UAVs) against multiple military targets is studied to solve the problems of tactical decision making based on the DMTAGII model. In the process of modeling, the specific expressions of strategy, status and payoff functions of the games are considered, and the strategy is coded to match the structure of genetic algorithm so that the PBNE can be solved by combining the genetic algorithm and the interactive iteration algorithm.Finally, through the simulation the feasibility and effectiveness of the DMTAGII model are verified. Meanwhile, the calculated equilibrium strategies are also found to be realistic, which can provide certain references for improving the autonomous ability of UAV systems.展开更多
Nowadays, the network defence policy selection using game model of incomplete information ignores the type of the defender, which quantifies cost simply, resulting in unreasonable defence policies selection. Aiming at...Nowadays, the network defence policy selection using game model of incomplete information ignores the type of the defender, which quantifies cost simply, resulting in unreasonable defence policies selection. Aiming at the problem, we use Bayesian game theory to model the active defence policy selection. We take the types of both the attacker and the defender into consideration. Besides, the traditional quantization method is enhanced. Then, we calculate the equilibrium of static Bayesian game. Based on the analysis of the equilibrium, we select the optimal defence policy through the prediction for attackers' actions. The paper calculates the defence effectiveness of defence policies and provides a defence policies selection algorithm. Ultimately, we present an example to verify the effectiveness of the method and model proposed in the paper.展开更多
The threat sequencing of multiple unmanned combat air vehicles(UCAVs) is a multi-attribute decision-making(MADM)problem. In the threat sequencing process of multiple UCAVs,due to the strong confrontation and high dyna...The threat sequencing of multiple unmanned combat air vehicles(UCAVs) is a multi-attribute decision-making(MADM)problem. In the threat sequencing process of multiple UCAVs,due to the strong confrontation and high dynamics of the air combat environment, the weight coefficients of the threat indicators are usually time-varying. Moreover, the air combat data is difficult to be obtained accurately. In this study, a threat sequencing method of multiple UCAVs is proposed based on game theory by considering the incomplete information. Firstly, a zero-sum game model of decision maker( D) and nature(N)with fuzzy payoffs is established to obtain the uncertain parameters which are the weight coefficient parameters of the threat indicators and the interval parameters of the threat matrix. Then,the established zero-sum game with fuzzy payoffs is transformed into a zero-sum game with crisp payoffs(matrix game) to solve. Moreover, a decision rule is addressed for the threat sequencing problem of multiple UCAVs based on the obtained uncertain parameters. Finally, numerical simulation results are presented to show the effectiveness of the proposed approach.展开更多
As a core component of the network,web applications have become one of the preferred targets for attackers because the static configuration of web applications simplifies the exploitation of vulnerabilities by attacke...As a core component of the network,web applications have become one of the preferred targets for attackers because the static configuration of web applications simplifies the exploitation of vulnerabilities by attackers.Although the moving target defense(MTD)has been proposed to increase the attack difficulty for the attackers,there is no solo approach can cope with different attacks;in addition,it is impossible to implement all these approaches simultaneously due to the resource limitation.Thus,the selection of an optimal defense strategy based on MTD has become the focus of research.In general,the confrontation of two players in the security domain is viewed as a stochastic game,and the reward matrices are known to both players.However,in a real security confrontation,this scenario represents an incomplete information game.Each player can only observe the actions performed by the opponent,and the observed actions are not completely accurate.To accurately describe the attacker’s reward function to reach the Nash equilibrium,this work simulated and updated the strategy selection distribution of the attacker by observing and investigating the strategy selection history of the attacker.Next,the possible rewards of the attacker in each confrontation via the observation matrix were corrected.On this basis,the Nash-Q learning algorithm with reward quantification was proposed to select the optimal strategy.Moreover,the performances of the Minimax-Q learning algorithm and Naive-Q learning algorithm were compared and analyzed in the MTD environment.Finally,the experimental results showed that the strategy selection algorithm can enable defenders to select a more reasonable defensive strategy and achieve the maximum possible reward.展开更多
The paper provides an analysis of a sender-receiver sequential signaling game. The private information of the sender is transmitted with noise by a Machine, i.e. does not always correctly reflect the state of nature. ...The paper provides an analysis of a sender-receiver sequential signaling game. The private information of the sender is transmitted with noise by a Machine, i.e. does not always correctly reflect the state of nature. Hence, a truthful revelation by the sender of his information does not necessarily imply that the signal he sends is correct. Also, the receiver can take a correct action even if the sender transmits an incorrect signal. The payoffs of the two players depend on their combined actions. Perfect Bayesian Equilibria which can result from different degrees of noise is analysed. The Bayesian updating of probabilities is explained. The fixed point theorem which makes the connection with the idea of rational expectations in economics is calculated. Given a number of equilibria, we comment on the most credible one on the basis of the implied payoffs for both players. The equilibrium signals are an example of the formation of a language convention discussed by D. Lewis.展开更多
Given the fragmentation of public opinion dissemination and the lag of network users’cognition,the paper analyzes public opinion dissemination with incomplete information,which can provide reference for us to control...Given the fragmentation of public opinion dissemination and the lag of network users’cognition,the paper analyzes public opinion dissemination with incomplete information,which can provide reference for us to control and guide the spread of public opinion.Based on the derivative and secondary radiation of public opinion dissemination with incomplete information,the Susceptible-Susceptible-Infected-Recovered-Recovered-Infected(SSIRR-I)model is proposed.Given the interaction between users,the Deffuant opinion dynamics model and evolutionary game theory are introduced to simulate the public opinion game between dissemination and immune nodes.Finally,the numerical simulation and results analysis are given.The results reveal that the rate of opinion convergence significantly affects disseminating public opinion,which is positively correlated with the promotion effect of the dissemination node and negatively correlated with the suppression effect of the immune node of public opinion dissemination.Derivative and secondary radiations have different effects on public opinion dissemination in the early stage,but promote public opinion dissemination in the later stage.The dominant immune nodes have an apparent inhibitory effect on the spread of public opinion;nevertheless,they cannot block the dissemination of public opinion.展开更多
Nowadays, security defence of network uses the game theory, which mostly applies complete information game model or even the static game model. To get closer to the actual network and defend actively, we propose a net...Nowadays, security defence of network uses the game theory, which mostly applies complete information game model or even the static game model. To get closer to the actual network and defend actively, we propose a network attack-defence game model by using signalling game, which is modelled in the way of dynamic and incomplete information. We improve the traditional attack-defence strategies quantization method to meet the needs of the network signalling game model. Moreover, we give the calculation of the game equilibrium and analyse the optimal defence scheme. Finally, we analyse and verify effectiveness of the model and method through a simulation experiment.展开更多
Recently,initiatives to integrate Internet of Things(IoT)technologies into smart buildings have attracted extensive attention for improving the performance of buildings and the comfort of occupants.However,the amount ...Recently,initiatives to integrate Internet of Things(IoT)technologies into smart buildings have attracted extensive attention for improving the performance of buildings and the comfort of occupants.However,the amount of data generated by IoT devices remains a challenge to the building management systems(BMSs)in terms of intensity and complexity.Different from cloud computing and edge computing,we propose a computation sharing architecture in smart buildings to incentivize idle computing devices(ICDs,sellers)to offload computational tasks for the BMS(buyer).In this paper,we design a hierarchical game model,consisting of a Stackelberg game and a Cournot game,to achieve a dynamic increase in computational capacity for the BMS.To guarantee the utility of BMS and ICDs,the Stackelberg game model is built to analyze the interactions between BMS and ICDs.Then,the Cournot game model is presented to formulate the internal competition among multiple ICDs.Under the premise of the subgame perfect Nash equilibrium,the BMS can quote the optimal pricing strategy,and the ICDs can share the corresponding optimal amount of computing resources.Finally,the simulation results show that the BMS’s computational capacity is enhanced on-demand,and each participant in the game obtains maximal utility.展开更多
In this paper, we proposed a general form of a multi-team Bertrand game. Then, we studied a two-team Bertrand game, each team consists of two firms, with heterogeneous strategies among teams and homogeneous strategies...In this paper, we proposed a general form of a multi-team Bertrand game. Then, we studied a two-team Bertrand game, each team consists of two firms, with heterogeneous strategies among teams and homogeneous strategies among players. We find the equilibrium solutions and the conditions of their local stability. Numerical simulations were used to illustrate the complex behaviour of the proposed model, such as period doubling bifurcation and chaos. Finally, we used the feedback control method to control the model.展开更多
Nowadays,cloud computing has been identified as new opportunities for migrating to the expected agility,reuse,and adaptive capabilities that can support the ever changing IT trends and requirements.Unfortunately,the r...Nowadays,cloud computing has been identified as new opportunities for migrating to the expected agility,reuse,and adaptive capabilities that can support the ever changing IT trends and requirements.Unfortunately,the rapid evolution of those technologies also comes with open issues such as security,privacy,integrity,quality of services,and their possible detrimental consequences.In this work,the concept of insurance is introduced to compensate the cloud computing customers when encountering those failures if service providers(SPs)have insurance purchased.Particularly,we consider the situation when the insurer is unable to see the system failure risk levels of the SPs,which is usually seen as an incomplete information market,in contrast with the optimal situation in a complete information market.First,an insurance-based cloud computing architecture is proposed to build a monetary credit system in which the cloud computing SP pays a premium for a certain coverage to the insurer.Subsequently,problem is formulated to solve the optimal insurance plan in complete and incomplete information markets,together with detail analysis of insurance policies in both cases are provided.Furthermore,simulation results show the properties of the two insurance plans and parameters that affect the design of the insurance plan.展开更多
The designing of incentive strategy when the follower’s objective functions have Parameters unknown to the leader is investigated in this paper. A desinging approach named IncentiveStrategy with Unknown but Bounded ...The designing of incentive strategy when the follower’s objective functions have Parameters unknown to the leader is investigated in this paper. A desinging approach named IncentiveStrategy with Unknown but Bounded error (ISUBB) is proposed. A simple example is given to explain the use of ISUBS.展开更多
基金supported by the Industry-University-Research Cooperation Fund Project of the Eighth Research Institute of China Aerospace Science and Technology Corporation (USCAST2022-11)Aeronautical Science Foundation of China (20220001057001)。
文摘This paper presents a novel cooperative value iteration(VI)-based adaptive dynamic programming method for multi-player differential game models with a convergence proof.The players are divided into two groups in the learning process and adapt their policies sequentially.Our method removes the dependence of admissible initial policies,which is one of the main drawbacks of the PI-based frameworks.Furthermore,this algorithm enables the players to adapt their control policies without full knowledge of others’ system parameters or control laws.The efficacy of our method is illustrated by three examples.
基金supported by the National Natural Science Foundation of China(Grant No.61933010 and 61903301)Shaanxi Aerospace Flight Vehicle Design Key Laboratory。
文摘Cooperative autonomous air combat of multiple unmanned aerial vehicles(UAVs)is one of the main combat modes in future air warfare,which becomes even more complicated with highly changeable situation and uncertain information of the opponents.As such,this paper presents a cooperative decision-making method based on incomplete information dynamic game to generate maneuver strategies for multiple UAVs in air combat.Firstly,a cooperative situation assessment model is presented to measure the overall combat situation.Secondly,an incomplete information dynamic game model is proposed to model the dynamic process of air combat,and a dynamic Bayesian network is designed to infer the tactical intention of the opponent.Then a reinforcement learning framework based on multiagent deep deterministic policy gradient is established to obtain the perfect Bayes-Nash equilibrium solution of the air combat game model.Finally,a series of simulations are conducted to verify the effectiveness of the proposed method,and the simulation results show effective synergies and cooperative tactics.
基金supported by Foundation for Innovative Research Groups of National Natural Science Foundation of China(NSFC)(61321002)National Science Fund for Distinguished Young Scholars(60925011)+2 种基金Projects of Major International(Regional)Joint Research Program NSFC(61120106010)Beijing Education Committee Cooperation Building Foundation Project,Program for Changjiang Scholars and Innovative Research Team in University(IRT1208)Chang Jiang Scholars Program and National Natural Science Foundation of China(61203078)
文摘At present, the studies on multi-team antagonistic games(MTAGs) are still in the early stage, because this complicated problem involves not only incompleteness of information and conflict of interests, but also selection of antagonistic targets.Therefore, based on the previous researches, a new framework is proposed in this paper, which is dynamic multi-team antagonistic games with incomplete information(DMTAGII) model.For this model, the corresponding concept of perfect Bayesian Nash equilibrium(PBNE) is established and the existence of PBNE is also proved. Besides, an interactive iteration algorithm is introduced according to the idea of the best response for solving the equilibrium. Then, the scenario of multiple unmanned aerial vehicles(UAVs) against multiple military targets is studied to solve the problems of tactical decision making based on the DMTAGII model. In the process of modeling, the specific expressions of strategy, status and payoff functions of the games are considered, and the strategy is coded to match the structure of genetic algorithm so that the PBNE can be solved by combining the genetic algorithm and the interactive iteration algorithm.Finally, through the simulation the feasibility and effectiveness of the DMTAGII model are verified. Meanwhile, the calculated equilibrium strategies are also found to be realistic, which can provide certain references for improving the autonomous ability of UAV systems.
基金supported by the National Natural Science Foundation of China under Grant No. 61303074 and No. 61309013the Henan Province Science and Technology Project Funds under Grant No. 12210231002
文摘Nowadays, the network defence policy selection using game model of incomplete information ignores the type of the defender, which quantifies cost simply, resulting in unreasonable defence policies selection. Aiming at the problem, we use Bayesian game theory to model the active defence policy selection. We take the types of both the attacker and the defender into consideration. Besides, the traditional quantization method is enhanced. Then, we calculate the equilibrium of static Bayesian game. Based on the analysis of the equilibrium, we select the optimal defence policy through the prediction for attackers' actions. The paper calculates the defence effectiveness of defence policies and provides a defence policies selection algorithm. Ultimately, we present an example to verify the effectiveness of the method and model proposed in the paper.
基金supported by the Major Projects for Science and Technology Innovation 2030 (2018AAA0100805)。
文摘The threat sequencing of multiple unmanned combat air vehicles(UCAVs) is a multi-attribute decision-making(MADM)problem. In the threat sequencing process of multiple UCAVs,due to the strong confrontation and high dynamics of the air combat environment, the weight coefficients of the threat indicators are usually time-varying. Moreover, the air combat data is difficult to be obtained accurately. In this study, a threat sequencing method of multiple UCAVs is proposed based on game theory by considering the incomplete information. Firstly, a zero-sum game model of decision maker( D) and nature(N)with fuzzy payoffs is established to obtain the uncertain parameters which are the weight coefficient parameters of the threat indicators and the interval parameters of the threat matrix. Then,the established zero-sum game with fuzzy payoffs is transformed into a zero-sum game with crisp payoffs(matrix game) to solve. Moreover, a decision rule is addressed for the threat sequencing problem of multiple UCAVs based on the obtained uncertain parameters. Finally, numerical simulation results are presented to show the effectiveness of the proposed approach.
基金This paper is supported by the National Key R&D Program of China(2017YFB0802703)the National Nature Science Foundation of China(61602052).
文摘As a core component of the network,web applications have become one of the preferred targets for attackers because the static configuration of web applications simplifies the exploitation of vulnerabilities by attackers.Although the moving target defense(MTD)has been proposed to increase the attack difficulty for the attackers,there is no solo approach can cope with different attacks;in addition,it is impossible to implement all these approaches simultaneously due to the resource limitation.Thus,the selection of an optimal defense strategy based on MTD has become the focus of research.In general,the confrontation of two players in the security domain is viewed as a stochastic game,and the reward matrices are known to both players.However,in a real security confrontation,this scenario represents an incomplete information game.Each player can only observe the actions performed by the opponent,and the observed actions are not completely accurate.To accurately describe the attacker’s reward function to reach the Nash equilibrium,this work simulated and updated the strategy selection distribution of the attacker by observing and investigating the strategy selection history of the attacker.Next,the possible rewards of the attacker in each confrontation via the observation matrix were corrected.On this basis,the Nash-Q learning algorithm with reward quantification was proposed to select the optimal strategy.Moreover,the performances of the Minimax-Q learning algorithm and Naive-Q learning algorithm were compared and analyzed in the MTD environment.Finally,the experimental results showed that the strategy selection algorithm can enable defenders to select a more reasonable defensive strategy and achieve the maximum possible reward.
文摘The paper provides an analysis of a sender-receiver sequential signaling game. The private information of the sender is transmitted with noise by a Machine, i.e. does not always correctly reflect the state of nature. Hence, a truthful revelation by the sender of his information does not necessarily imply that the signal he sends is correct. Also, the receiver can take a correct action even if the sender transmits an incorrect signal. The payoffs of the two players depend on their combined actions. Perfect Bayesian Equilibria which can result from different degrees of noise is analysed. The Bayesian updating of probabilities is explained. The fixed point theorem which makes the connection with the idea of rational expectations in economics is calculated. Given a number of equilibria, we comment on the most credible one on the basis of the implied payoffs for both players. The equilibrium signals are an example of the formation of a language convention discussed by D. Lewis.
基金supported by the National Social Science Foundation of China(No.20BGL025)and the Postgraduate Practice Innovation Program of Jiangsu Province(No.SJCX200316).
文摘Given the fragmentation of public opinion dissemination and the lag of network users’cognition,the paper analyzes public opinion dissemination with incomplete information,which can provide reference for us to control and guide the spread of public opinion.Based on the derivative and secondary radiation of public opinion dissemination with incomplete information,the Susceptible-Susceptible-Infected-Recovered-Recovered-Infected(SSIRR-I)model is proposed.Given the interaction between users,the Deffuant opinion dynamics model and evolutionary game theory are introduced to simulate the public opinion game between dissemination and immune nodes.Finally,the numerical simulation and results analysis are given.The results reveal that the rate of opinion convergence significantly affects disseminating public opinion,which is positively correlated with the promotion effect of the dissemination node and negatively correlated with the suppression effect of the immune node of public opinion dissemination.Derivative and secondary radiations have different effects on public opinion dissemination in the early stage,but promote public opinion dissemination in the later stage.The dominant immune nodes have an apparent inhibitory effect on the spread of public opinion;nevertheless,they cannot block the dissemination of public opinion.
基金supported by the National Natural Science Foundation of China under Grant No. 61303074 and No. 61309013the Henan Province Science and Technology Project Funds under Grant No. 12210231002
文摘Nowadays, security defence of network uses the game theory, which mostly applies complete information game model or even the static game model. To get closer to the actual network and defend actively, we propose a network attack-defence game model by using signalling game, which is modelled in the way of dynamic and incomplete information. We improve the traditional attack-defence strategies quantization method to meet the needs of the network signalling game model. Moreover, we give the calculation of the game equilibrium and analyse the optimal defence scheme. Finally, we analyse and verify effectiveness of the model and method through a simulation experiment.
基金in part by the Natural Science Foundation of China under Grant 61871446,61801238the Postgraduate Research&Practice Innovation Program of Jiangsu Province under Grant KYCX18_0887the Natural Science Foundation of Jiangsu Province under Grant BK20170758.
文摘Recently,initiatives to integrate Internet of Things(IoT)technologies into smart buildings have attracted extensive attention for improving the performance of buildings and the comfort of occupants.However,the amount of data generated by IoT devices remains a challenge to the building management systems(BMSs)in terms of intensity and complexity.Different from cloud computing and edge computing,we propose a computation sharing architecture in smart buildings to incentivize idle computing devices(ICDs,sellers)to offload computational tasks for the BMS(buyer).In this paper,we design a hierarchical game model,consisting of a Stackelberg game and a Cournot game,to achieve a dynamic increase in computational capacity for the BMS.To guarantee the utility of BMS and ICDs,the Stackelberg game model is built to analyze the interactions between BMS and ICDs.Then,the Cournot game model is presented to formulate the internal competition among multiple ICDs.Under the premise of the subgame perfect Nash equilibrium,the BMS can quote the optimal pricing strategy,and the ICDs can share the corresponding optimal amount of computing resources.Finally,the simulation results show that the BMS’s computational capacity is enhanced on-demand,and each participant in the game obtains maximal utility.
文摘In this paper, we proposed a general form of a multi-team Bertrand game. Then, we studied a two-team Bertrand game, each team consists of two firms, with heterogeneous strategies among teams and homogeneous strategies among players. We find the equilibrium solutions and the conditions of their local stability. Numerical simulations were used to illustrate the complex behaviour of the proposed model, such as period doubling bifurcation and chaos. Finally, we used the feedback control method to control the model.
基金the National Natural Science Foundation of China(62001085)Sichuan Science and Technology Program(2021YFG0349)。
文摘Nowadays,cloud computing has been identified as new opportunities for migrating to the expected agility,reuse,and adaptive capabilities that can support the ever changing IT trends and requirements.Unfortunately,the rapid evolution of those technologies also comes with open issues such as security,privacy,integrity,quality of services,and their possible detrimental consequences.In this work,the concept of insurance is introduced to compensate the cloud computing customers when encountering those failures if service providers(SPs)have insurance purchased.Particularly,we consider the situation when the insurer is unable to see the system failure risk levels of the SPs,which is usually seen as an incomplete information market,in contrast with the optimal situation in a complete information market.First,an insurance-based cloud computing architecture is proposed to build a monetary credit system in which the cloud computing SP pays a premium for a certain coverage to the insurer.Subsequently,problem is formulated to solve the optimal insurance plan in complete and incomplete information markets,together with detail analysis of insurance policies in both cases are provided.Furthermore,simulation results show the properties of the two insurance plans and parameters that affect the design of the insurance plan.
文摘The designing of incentive strategy when the follower’s objective functions have Parameters unknown to the leader is investigated in this paper. A desinging approach named IncentiveStrategy with Unknown but Bounded error (ISUBB) is proposed. A simple example is given to explain the use of ISUBS.