As a core component of the network,web applications have become one of the preferred targets for attackers because the static configuration of web applications simplifies the exploitation of vulnerabilities by attacke...As a core component of the network,web applications have become one of the preferred targets for attackers because the static configuration of web applications simplifies the exploitation of vulnerabilities by attackers.Although the moving target defense(MTD)has been proposed to increase the attack difficulty for the attackers,there is no solo approach can cope with different attacks;in addition,it is impossible to implement all these approaches simultaneously due to the resource limitation.Thus,the selection of an optimal defense strategy based on MTD has become the focus of research.In general,the confrontation of two players in the security domain is viewed as a stochastic game,and the reward matrices are known to both players.However,in a real security confrontation,this scenario represents an incomplete information game.Each player can only observe the actions performed by the opponent,and the observed actions are not completely accurate.To accurately describe the attacker’s reward function to reach the Nash equilibrium,this work simulated and updated the strategy selection distribution of the attacker by observing and investigating the strategy selection history of the attacker.Next,the possible rewards of the attacker in each confrontation via the observation matrix were corrected.On this basis,the Nash-Q learning algorithm with reward quantification was proposed to select the optimal strategy.Moreover,the performances of the Minimax-Q learning algorithm and Naive-Q learning algorithm were compared and analyzed in the MTD environment.Finally,the experimental results showed that the strategy selection algorithm can enable defenders to select a more reasonable defensive strategy and achieve the maximum possible reward.展开更多
Cooperative autonomous air combat of multiple unmanned aerial vehicles(UAVs)is one of the main combat modes in future air warfare,which becomes even more complicated with highly changeable situation and uncertain info...Cooperative autonomous air combat of multiple unmanned aerial vehicles(UAVs)is one of the main combat modes in future air warfare,which becomes even more complicated with highly changeable situation and uncertain information of the opponents.As such,this paper presents a cooperative decision-making method based on incomplete information dynamic game to generate maneuver strategies for multiple UAVs in air combat.Firstly,a cooperative situation assessment model is presented to measure the overall combat situation.Secondly,an incomplete information dynamic game model is proposed to model the dynamic process of air combat,and a dynamic Bayesian network is designed to infer the tactical intention of the opponent.Then a reinforcement learning framework based on multiagent deep deterministic policy gradient is established to obtain the perfect Bayes-Nash equilibrium solution of the air combat game model.Finally,a series of simulations are conducted to verify the effectiveness of the proposed method,and the simulation results show effective synergies and cooperative tactics.展开更多
This paper discusses the relationship of two independently developed models of games with incomplete information,hierarchical hypergames and Bayesian games.It can be considered as a generalization of the previous stud...This paper discusses the relationship of two independently developed models of games with incomplete information,hierarchical hypergames and Bayesian games.It can be considered as a generalization of the previous study on the theoretical comparison of simple hypergames and Bayesian games(Sasaki and Kijima,2012) by taking into account hierarchy of perceptions,i.e.,an agent's perception about the other agents' perceptions,and so on.The authors first introduce the general way of transformation of any hierarchical hypergames into corresponding Bayesian games,which was called as the Bayesian representation of hierarchical hypergames.The authors then show that some equilibrium concepts for hierarchical hypergames can be associated with those for Bayesian games and discuss implications of the results.展开更多
Bargaining based mechanism for sharing spectrum between radio access networks (RANs) belonging to multioperators is studied, to improve spectrum utilization efficiency and maximize network revenue. By introducing an...Bargaining based mechanism for sharing spectrum between radio access networks (RANs) belonging to multioperators is studied, to improve spectrum utilization efficiency and maximize network revenue. By introducing an intelligent agent, each RAN has the ability, which includes trading information exchanging, final decision making, and so on, to trade the spectrum with other RANs. The proposed inter-operator spectrum sharing mechanism is modeled as an infinite-horizon bargaining game with incomplete information, and the resulting bargaining game has unique sequential equilibrium. Consequently, the implementation is refined based on the analysis. Simulation results show that the proposed mechanism outperforms the conventional fixed spectrum management (FSM) method in network revenue, spectrum efficiency, and call blocking rate.展开更多
基金This paper is supported by the National Key R&D Program of China(2017YFB0802703)the National Nature Science Foundation of China(61602052).
文摘As a core component of the network,web applications have become one of the preferred targets for attackers because the static configuration of web applications simplifies the exploitation of vulnerabilities by attackers.Although the moving target defense(MTD)has been proposed to increase the attack difficulty for the attackers,there is no solo approach can cope with different attacks;in addition,it is impossible to implement all these approaches simultaneously due to the resource limitation.Thus,the selection of an optimal defense strategy based on MTD has become the focus of research.In general,the confrontation of two players in the security domain is viewed as a stochastic game,and the reward matrices are known to both players.However,in a real security confrontation,this scenario represents an incomplete information game.Each player can only observe the actions performed by the opponent,and the observed actions are not completely accurate.To accurately describe the attacker’s reward function to reach the Nash equilibrium,this work simulated and updated the strategy selection distribution of the attacker by observing and investigating the strategy selection history of the attacker.Next,the possible rewards of the attacker in each confrontation via the observation matrix were corrected.On this basis,the Nash-Q learning algorithm with reward quantification was proposed to select the optimal strategy.Moreover,the performances of the Minimax-Q learning algorithm and Naive-Q learning algorithm were compared and analyzed in the MTD environment.Finally,the experimental results showed that the strategy selection algorithm can enable defenders to select a more reasonable defensive strategy and achieve the maximum possible reward.
基金supported by the National Natural Science Foundation of China(Grant No.61933010 and 61903301)Shaanxi Aerospace Flight Vehicle Design Key Laboratory。
文摘Cooperative autonomous air combat of multiple unmanned aerial vehicles(UAVs)is one of the main combat modes in future air warfare,which becomes even more complicated with highly changeable situation and uncertain information of the opponents.As such,this paper presents a cooperative decision-making method based on incomplete information dynamic game to generate maneuver strategies for multiple UAVs in air combat.Firstly,a cooperative situation assessment model is presented to measure the overall combat situation.Secondly,an incomplete information dynamic game model is proposed to model the dynamic process of air combat,and a dynamic Bayesian network is designed to infer the tactical intention of the opponent.Then a reinforcement learning framework based on multiagent deep deterministic policy gradient is established to obtain the perfect Bayes-Nash equilibrium solution of the air combat game model.Finally,a series of simulations are conducted to verify the effectiveness of the proposed method,and the simulation results show effective synergies and cooperative tactics.
文摘This paper discusses the relationship of two independently developed models of games with incomplete information,hierarchical hypergames and Bayesian games.It can be considered as a generalization of the previous study on the theoretical comparison of simple hypergames and Bayesian games(Sasaki and Kijima,2012) by taking into account hierarchy of perceptions,i.e.,an agent's perception about the other agents' perceptions,and so on.The authors first introduce the general way of transformation of any hierarchical hypergames into corresponding Bayesian games,which was called as the Bayesian representation of hierarchical hypergames.The authors then show that some equilibrium concepts for hierarchical hypergames can be associated with those for Bayesian games and discuss implications of the results.
基金This work is supported by the National Natural Science Foundation of China (60632030);the Hi-Tech Research and Development Program of China (2006AA01Z276);the Integrated Project of the 6th Framework Program of the European Commission (IST-2005-027714);the China-European Union Science and Technology Cooperation Foundation of Ministry of Science and Technology of China (0516).
文摘Bargaining based mechanism for sharing spectrum between radio access networks (RANs) belonging to multioperators is studied, to improve spectrum utilization efficiency and maximize network revenue. By introducing an intelligent agent, each RAN has the ability, which includes trading information exchanging, final decision making, and so on, to trade the spectrum with other RANs. The proposed inter-operator spectrum sharing mechanism is modeled as an infinite-horizon bargaining game with incomplete information, and the resulting bargaining game has unique sequential equilibrium. Consequently, the implementation is refined based on the analysis. Simulation results show that the proposed mechanism outperforms the conventional fixed spectrum management (FSM) method in network revenue, spectrum efficiency, and call blocking rate.