In general-sum games, taking all agent's collective rationality into account, we define agents' global objective, and propose a novel multi-agent reinforcement learning(RL) algorithm based on global policy. In eac...In general-sum games, taking all agent's collective rationality into account, we define agents' global objective, and propose a novel multi-agent reinforcement learning(RL) algorithm based on global policy. In each learning step, all agents commit to select the global policy to achieve the global goal. We prove this learning algorithm converges given certain restrictions on stage games of learned Q values, and show that it has quite lower computation time complexity than already developed multi-agent learning algorithms for general-sum games. An example is analyzed to show the algorithm' s merits.展开更多
In this paper,a day-ahead electricity market bidding problem with multiple strategic generation company(GEN-CO)bidders is studied.The problem is formulated as a Markov game model,where GENCO bidders interact with each...In this paper,a day-ahead electricity market bidding problem with multiple strategic generation company(GEN-CO)bidders is studied.The problem is formulated as a Markov game model,where GENCO bidders interact with each other to develop their optimal day-ahead bidding strategies.Considering unobservable information in the problem,a model-free and data-driven approach,known as multi-agent deep deterministic policy gradient(MADDPG),is applied for approximating the Nash equilibrium(NE)in the above Markov game.The MAD-DPG algorithm has the advantage of generalization due to the automatic feature extraction ability of the deep neural networks.The algorithm is tested on an IEEE 30-bus system with three competitive GENCO bidders in both an uncongested case and a congested case.Comparisons with a truthful bidding strategy and state-of-the-art deep reinforcement learning methods including deep Q network and deep deterministic policy gradient(DDPG)demonstrate that the applied MADDPG algorithm can find a superior bidding strategy for all the market participants with increased profit gains.In addition,the comparison with a conventional-model-based method shows that the MADDPG algorithm has higher computational efficiency,which is feasible for real-world applications.展开更多
To solve the problem of conflict and deadlock with agents in multiagent system,an algorithm of multiagent coordination and cooperation was proposed. Taking agent in multiagent system as a player,the pursuit problem Ma...To solve the problem of conflict and deadlock with agents in multiagent system,an algorithm of multiagent coordination and cooperation was proposed. Taking agent in multiagent system as a player,the pursuit problem Markov model was built. The solution was introduced to get the optimal Nash equilibrium by multiagent reinforcement learning. The method of probability and statistics and Bayes formula was used to estimate the policy knowledge of other players. Relative mean deviation method was used to evaluate the confidence degree in order to increase the convergence speed. The simulation results on pursuit problem showed the feasibility and validity of the given algorithm.展开更多
文摘In general-sum games, taking all agent's collective rationality into account, we define agents' global objective, and propose a novel multi-agent reinforcement learning(RL) algorithm based on global policy. In each learning step, all agents commit to select the global policy to achieve the global goal. We prove this learning algorithm converges given certain restrictions on stage games of learned Q values, and show that it has quite lower computation time complexity than already developed multi-agent learning algorithms for general-sum games. An example is analyzed to show the algorithm' s merits.
基金This work was supported in part by the US Department of Energy(DOE),Office of Electricity and Office of Energy Efficiency and Renewable Energy under contract DE-AC05-00OR22725in part by CURENT,an Engineering Research Center funded by US National Science Foundation(NSF)and DOE under NSF award EEC-1041877in part by NSF award ECCS-1809458.
文摘In this paper,a day-ahead electricity market bidding problem with multiple strategic generation company(GEN-CO)bidders is studied.The problem is formulated as a Markov game model,where GENCO bidders interact with each other to develop their optimal day-ahead bidding strategies.Considering unobservable information in the problem,a model-free and data-driven approach,known as multi-agent deep deterministic policy gradient(MADDPG),is applied for approximating the Nash equilibrium(NE)in the above Markov game.The MAD-DPG algorithm has the advantage of generalization due to the automatic feature extraction ability of the deep neural networks.The algorithm is tested on an IEEE 30-bus system with three competitive GENCO bidders in both an uncongested case and a congested case.Comparisons with a truthful bidding strategy and state-of-the-art deep reinforcement learning methods including deep Q network and deep deterministic policy gradient(DDPG)demonstrate that the applied MADDPG algorithm can find a superior bidding strategy for all the market participants with increased profit gains.In addition,the comparison with a conventional-model-based method shows that the MADDPG algorithm has higher computational efficiency,which is feasible for real-world applications.
基金Sponsored by the Fundamental Research Funds for the Central Universities of China (Grant No. DL12BB11)Program for New Century Excellent Talentsin University (Grant No. NCET-10-0279)Heilongjiang Postdoctoral Grant( Grant No. LRB11-334)
文摘To solve the problem of conflict and deadlock with agents in multiagent system,an algorithm of multiagent coordination and cooperation was proposed. Taking agent in multiagent system as a player,the pursuit problem Markov model was built. The solution was introduced to get the optimal Nash equilibrium by multiagent reinforcement learning. The method of probability and statistics and Bayes formula was used to estimate the policy knowledge of other players. Relative mean deviation method was used to evaluate the confidence degree in order to increase the convergence speed. The simulation results on pursuit problem showed the feasibility and validity of the given algorithm.