There are a few studies that focus on solution methods for finding a Nash equilibrium of zero-sum games. We discuss the use of Karmarkar’s interior point method to solve the Nash equilibrium problems of a zero-sum ga...There are a few studies that focus on solution methods for finding a Nash equilibrium of zero-sum games. We discuss the use of Karmarkar’s interior point method to solve the Nash equilibrium problems of a zero-sum game, and prove that it is theoretically a polynomial time algorithm. We implement the Karmarkar method, and a preliminary computational result shows that it performs well for zero-sum games. We also mention an affine scaling method that would help us compute Nash equilibria of general zero-sum games effectively.展开更多
In this paper, we consider multiobjective two-person zero-sum games with vector payoffs and vector fuzzy payoffs. We translate such games into the corresponding multiobjective programming problems and introduce the pe...In this paper, we consider multiobjective two-person zero-sum games with vector payoffs and vector fuzzy payoffs. We translate such games into the corresponding multiobjective programming problems and introduce the pessimistic Pareto optimal solution concept by assuming that a player supposes the opponent adopts the most disadvantage strategy for the self. It is shown that any pessimistic Pareto optimal solution can be obtained on the basis of linear programming techniques even if the membership functions for the objective functions are nonlinear. Moreover, we propose interactive algorithms based on the bisection method to obtain a pessimistic compromise solution from among the set of all pessimistic Pareto optimal solutions. In order to show the efficiency of the proposed method, we illustrate interactive processes of an application to a vegetable shipment problem.展开更多
Minimax algorithm and machine learning technologies have been studied for decades to reach an ideal optimization in game areas such as chess and backgammon. In these fields, several generations try to optimize the cod...Minimax algorithm and machine learning technologies have been studied for decades to reach an ideal optimization in game areas such as chess and backgammon. In these fields, several generations try to optimize the code for pruning and effectiveness of evaluation function. Thus, there are well-armed algorithms to deal with various sophisticated situations in gaming occasion. However, as a traditional zero-sum game, Connect-4 receives less attention compared with the other members of its zero-sum family using traditional minimax algorithm. In recent years, new generation of heuristics is created to address this problem based on research conclusions, expertise and gaming experiences. However, this paper mainly introduced a self-developed heuristics supported by well-demonstrated result from researches and our own experiences which fighting against the available version of Connect-4 system online. While most previous works focused on winning algorithms and knowledge based approaches, we complement these works with analysis of heuristics. We have conducted three experiments on the relationship among functionality, depth of searching and number of features and doing contrastive test with sample online. Different from the sample based on summarized experience and generalized features, our heuristics have a basic concentration on detailed connection between pieces on board. By analysing the winning percentages when our version fights against the online sample with different searching depths, we find that our heuristics with minimax algorithm is perfect on the early stages of the zero-sum game playing. Because some nodes in the game tree have no influence on the final decision of minimax algorithm, we use alpha-beta pruning to decrease the number of meaningless node which greatly increases the minimax efficiency. During the contrastive experiment with the online sample, this paper also verifies basic characters of the minimax algorithm including depths and quantity of features. According to the experiment, these two characters can both effect the decision for each step and none of them can be absolutely in charge. Besides, we also explore some potential future issues in Connect-4 game optimization such as precise adjustment on heuristic values and inefficiency pruning on the search tree.展开更多
In this paper we study zero-sum stochastic games. The optimality criterion is the long-run expected average criterion, and the payoff function may have neither upper nor lower bounds. We give a new set of conditions f...In this paper we study zero-sum stochastic games. The optimality criterion is the long-run expected average criterion, and the payoff function may have neither upper nor lower bounds. We give a new set of conditions for the existence of a value and a pair of optimal stationary strategies. Our conditions are slightly weaker than those in the previous literature, and some new sufficient conditions for the existence of a pair of optimal stationary strategies are imposed on the primitive data of the model. Our results are illustrated with a queueing system, for which our conditions are satisfied but some of the conditions in some previous literatures fail to hold.展开更多
This paper will present an approximate/adaptive dynamic programming(ADP) algorithm,that uses the idea of integral reinforcement learning(IRL),to determine online the Nash equilibrium solution for the two-player zerosu...This paper will present an approximate/adaptive dynamic programming(ADP) algorithm,that uses the idea of integral reinforcement learning(IRL),to determine online the Nash equilibrium solution for the two-player zerosum differential game with linear dynamics and infinite horizon quadratic cost.The algorithm is built around an iterative method that has been developed in the control engineering community for solving the continuous-time game algebraic Riccati equation(CT-GARE),which underlies the game problem.We here show how the ADP techniques will enhance the capabilities of the offline method allowing an online solution without the requirement of complete knowledge of the system dynamics.The feasibility of the ADP scheme is demonstrated in simulation for a power system control application.The adaptation goal is the best control policy that will face in an optimal manner the highest load disturbance.展开更多
武器装备体系作战仿真研究隶属于复杂系统研究范畴,首次对基于Nash-Q的网络信息体系(network information system-of-systems,NISoS)对抗认知决策行为进行探索研究。Nash-Q算法与联合Q-learning算法具有类似的形式,其区别在于联合策略...武器装备体系作战仿真研究隶属于复杂系统研究范畴,首次对基于Nash-Q的网络信息体系(network information system-of-systems,NISoS)对抗认知决策行为进行探索研究。Nash-Q算法与联合Q-learning算法具有类似的形式,其区别在于联合策略的计算,对于零和博弈体系作战模型,由于Nash-Q不需要其他Agent的历史信息即可通过Nash均衡的求解而获得混合策略,因此更易于实现也更加高效。建立了战役层次零和作战动态博弈模型,在不需要其他Agent的完全信息时,给出了Nash均衡的求解方法。此外,采用高斯径向基神经网络对Q表进行离散,使得算法具有更好的离散效果以及泛化能力。最后,通过NISoS作战仿真实验验证了算法的有效性以及相比基于Q-learning算法以及Rule-based决策算法具有更高的收益,并且在离线决策中表现优异。展开更多
In this paper, neural networks are used to approximately solve the finite-horizon constrained input H-infinity state feedback control problem. The method is based on solving a related Hamilton-Jacobi-Isaacs equation o...In this paper, neural networks are used to approximately solve the finite-horizon constrained input H-infinity state feedback control problem. The method is based on solving a related Hamilton-Jacobi-Isaacs equation of the corresponding finite-horizon zero-sum game. The game value function is approximated by a neural network with time- varying weights. It is shown that the neural network approximation converges uniformly to the game-value function and the resulting almost optimal constrained feedback controller provides closed-loop stability and bounded L2 gain. The result is an almost optimal H-infinity feedback controller with time-varying coefficients that is solved a priori off-line. The effectiveness of the method is shown on the Rotational/Translational Actuator benchmark nonlinear control problem.展开更多
基金Supported by National High Technology Research and Development Program of China (863 Program) (2006AA04Z183), National Natural Science Foundation of China (60621001, 60534010, 60572070, 60774048, 60728307), Program for Changjiang Scholars and Innovative Research Groups of China (60728307, 4031002)
文摘There are a few studies that focus on solution methods for finding a Nash equilibrium of zero-sum games. We discuss the use of Karmarkar’s interior point method to solve the Nash equilibrium problems of a zero-sum game, and prove that it is theoretically a polynomial time algorithm. We implement the Karmarkar method, and a preliminary computational result shows that it performs well for zero-sum games. We also mention an affine scaling method that would help us compute Nash equilibria of general zero-sum games effectively.
文摘In this paper, we consider multiobjective two-person zero-sum games with vector payoffs and vector fuzzy payoffs. We translate such games into the corresponding multiobjective programming problems and introduce the pessimistic Pareto optimal solution concept by assuming that a player supposes the opponent adopts the most disadvantage strategy for the self. It is shown that any pessimistic Pareto optimal solution can be obtained on the basis of linear programming techniques even if the membership functions for the objective functions are nonlinear. Moreover, we propose interactive algorithms based on the bisection method to obtain a pessimistic compromise solution from among the set of all pessimistic Pareto optimal solutions. In order to show the efficiency of the proposed method, we illustrate interactive processes of an application to a vegetable shipment problem.
文摘Minimax algorithm and machine learning technologies have been studied for decades to reach an ideal optimization in game areas such as chess and backgammon. In these fields, several generations try to optimize the code for pruning and effectiveness of evaluation function. Thus, there are well-armed algorithms to deal with various sophisticated situations in gaming occasion. However, as a traditional zero-sum game, Connect-4 receives less attention compared with the other members of its zero-sum family using traditional minimax algorithm. In recent years, new generation of heuristics is created to address this problem based on research conclusions, expertise and gaming experiences. However, this paper mainly introduced a self-developed heuristics supported by well-demonstrated result from researches and our own experiences which fighting against the available version of Connect-4 system online. While most previous works focused on winning algorithms and knowledge based approaches, we complement these works with analysis of heuristics. We have conducted three experiments on the relationship among functionality, depth of searching and number of features and doing contrastive test with sample online. Different from the sample based on summarized experience and generalized features, our heuristics have a basic concentration on detailed connection between pieces on board. By analysing the winning percentages when our version fights against the online sample with different searching depths, we find that our heuristics with minimax algorithm is perfect on the early stages of the zero-sum game playing. Because some nodes in the game tree have no influence on the final decision of minimax algorithm, we use alpha-beta pruning to decrease the number of meaningless node which greatly increases the minimax efficiency. During the contrastive experiment with the online sample, this paper also verifies basic characters of the minimax algorithm including depths and quantity of features. According to the experiment, these two characters can both effect the decision for each step and none of them can be absolutely in charge. Besides, we also explore some potential future issues in Connect-4 game optimization such as precise adjustment on heuristic values and inefficiency pruning on the search tree.
文摘In this paper we study zero-sum stochastic games. The optimality criterion is the long-run expected average criterion, and the payoff function may have neither upper nor lower bounds. We give a new set of conditions for the existence of a value and a pair of optimal stationary strategies. Our conditions are slightly weaker than those in the previous literature, and some new sufficient conditions for the existence of a pair of optimal stationary strategies are imposed on the primitive data of the model. Our results are illustrated with a queueing system, for which our conditions are satisfied but some of the conditions in some previous literatures fail to hold.
基金supported by the National Science Foundation (No.ECCS-0801330)the Army Research Office (No.W91NF-05-1-0314)
文摘This paper will present an approximate/adaptive dynamic programming(ADP) algorithm,that uses the idea of integral reinforcement learning(IRL),to determine online the Nash equilibrium solution for the two-player zerosum differential game with linear dynamics and infinite horizon quadratic cost.The algorithm is built around an iterative method that has been developed in the control engineering community for solving the continuous-time game algebraic Riccati equation(CT-GARE),which underlies the game problem.We here show how the ADP techniques will enhance the capabilities of the offline method allowing an online solution without the requirement of complete knowledge of the system dynamics.The feasibility of the ADP scheme is demonstrated in simulation for a power system control application.The adaptation goal is the best control policy that will face in an optimal manner the highest load disturbance.
文摘武器装备体系作战仿真研究隶属于复杂系统研究范畴,首次对基于Nash-Q的网络信息体系(network information system-of-systems,NISoS)对抗认知决策行为进行探索研究。Nash-Q算法与联合Q-learning算法具有类似的形式,其区别在于联合策略的计算,对于零和博弈体系作战模型,由于Nash-Q不需要其他Agent的历史信息即可通过Nash均衡的求解而获得混合策略,因此更易于实现也更加高效。建立了战役层次零和作战动态博弈模型,在不需要其他Agent的完全信息时,给出了Nash均衡的求解方法。此外,采用高斯径向基神经网络对Q表进行离散,使得算法具有更好的离散效果以及泛化能力。最后,通过NISoS作战仿真实验验证了算法的有效性以及相比基于Q-learning算法以及Rule-based决策算法具有更高的收益,并且在离线决策中表现优异。
基金This work was supported by the National Science Foundation (ECS-0501451)Army Research Office (W91NF-05-1-0314).
文摘In this paper, neural networks are used to approximately solve the finite-horizon constrained input H-infinity state feedback control problem. The method is based on solving a related Hamilton-Jacobi-Isaacs equation of the corresponding finite-horizon zero-sum game. The game value function is approximated by a neural network with time- varying weights. It is shown that the neural network approximation converges uniformly to the game-value function and the resulting almost optimal constrained feedback controller provides closed-loop stability and bounded L2 gain. The result is an almost optimal H-infinity feedback controller with time-varying coefficients that is solved a priori off-line. The effectiveness of the method is shown on the Rotational/Translational Actuator benchmark nonlinear control problem.