Multi-agent reinforcement learning(MARL)has been a rapidly evolving field.This paper presents a comprehensive survey of MARL and its applications.We trace the historical evolution of MARL,highlight its progress,and di...Multi-agent reinforcement learning(MARL)has been a rapidly evolving field.This paper presents a comprehensive survey of MARL and its applications.We trace the historical evolution of MARL,highlight its progress,and discuss related survey works.Then,we review the existing works addressing inherent challenges and those focusing on diverse applications.Some representative stochastic games,MARL means,spatial forms of MARL,and task classification are revisited.We then conduct an in-depth exploration of a variety of challenges encountered in MARL applications.We also address critical operational aspects,such as hyperparameter tuning and computational complexity,which are pivotal in practical implementations of MARL.Afterward,we make a thorough overview of the applications of MARL to intelligent machines and devices,chemical engineering,biotechnology,healthcare,and societal issues,which highlights the extensive potential and relevance of MARL within both current and future technological contexts.Our survey also encompasses a detailed examination of benchmark environments used in MARL research,which are instrumental in evaluating MARL algorithms and demonstrate the adaptability of MARL to diverse application scenarios.In the end,we give our prospect for MARL and discuss their related techniques and potential future applications.展开更多
The multi-agent system is the optimal solution to complex intelligent problems. In accordance with the game theory, the concept of loyalty is introduced to analyze the relationship between agents' individual incom...The multi-agent system is the optimal solution to complex intelligent problems. In accordance with the game theory, the concept of loyalty is introduced to analyze the relationship between agents' individual income and global benefits and build the logical architecture of the multi-agent system. Besides, to verify the feasibility of the method, the cyclic neural network is optimized, the bi-directional coordination network is built as the training network for deep learning, and specific training scenes are simulated as the training background. After a certain number of training iterations, the model can learn simple strategies autonomously. Also,as the training time increases, the complexity of learning strategies rises gradually. Strategies such as obstacle avoidance, firepower distribution and collaborative cover are adopted to demonstrate the achievability of the model. The model is verified to be realizable by the examples of obstacle avoidance, fire distribution and cooperative cover. Under the same resource background, the model exhibits better convergence than other deep learning training networks, and it is not easy to fall into the local endless loop.Furthermore, the ability of the learning strategy is stronger than that of the training model based on rules, which is of great practical values.展开更多
The secure dominating set(SDS),a variant of the dominating set,is an important combinatorial structure used in wireless networks.In this paper,we apply algorithmic game theory to study the minimum secure dominating se...The secure dominating set(SDS),a variant of the dominating set,is an important combinatorial structure used in wireless networks.In this paper,we apply algorithmic game theory to study the minimum secure dominating set(Min SDS) problem in a multi-agent system.We design a game framework for SDS and show that every Nash equilibrium(NE) is a minimal SDS,which is also a Pareto-optimal solution.We prove that the proposed game is an exact potential game,and thus NE exists,and design a polynomial-time distributed local algorithm which converges to an NE in O(n) rounds of interactions.Extensive experiments are done to test the performance of our algorithm,and some interesting phenomena are witnessed.展开更多
In the evolutionary game of the same task for groups,the changes in game rules,personal interests,the crowd size,and external supervision cause uncertain effects on individual decision-making and game results.In the M...In the evolutionary game of the same task for groups,the changes in game rules,personal interests,the crowd size,and external supervision cause uncertain effects on individual decision-making and game results.In the Markov decision framework,a single-task multi-decision evolutionary game model based on multi-agent reinforcement learning is proposed to explore the evolutionary rules in the process of a game.The model can improve the result of a evolutionary game and facilitate the completion of the task.First,based on the multi-agent theory,to solve the existing problems in the original model,a negative feedback tax penalty mechanism is proposed to guide the strategy selection of individuals in the group.In addition,in order to evaluate the evolutionary game results of the group in the model,a calculation method of the group intelligence level is defined.Secondly,the Q-learning algorithm is used to improve the guiding effect of the negative feedback tax penalty mechanism.In the model,the selection strategy of the Q-learning algorithm is improved and a bounded rationality evolutionary game strategy is proposed based on the rule of evolutionary games and the consideration of the bounded rationality of individuals.Finally,simulation results show that the proposed model can effectively guide individuals to choose cooperation strategies which are beneficial to task completion and stability under different negative feedback factor values and different group sizes,so as to improve the group intelligence level.展开更多
While price schedules can help improve the economic efficiency of renewable energy-powered microgrids,timeof-use(TOU)pricing has been identified as an effective way for microgrid development,which is presently limited...While price schedules can help improve the economic efficiency of renewable energy-powered microgrids,timeof-use(TOU)pricing has been identified as an effective way for microgrid development,which is presently limited by its high costs.In this study,we propose an evolutionary game theoretic model to explore optimal TOU pricing for development of renewable energy-powered microgrids by applying a multi-agent system,that comprises a government agent,local utility company agent,and different types of consumer agents.In the proposed model,we design objective functions for the company and the consumers and obtain a Nash equilibrium using backward induction.Two pricing strategies,namely,the TOU seasonal pricing and TOU monthly pricing,are evaluated and compared with traditional fixed pricing.The numerical results demonstrate that TOU schedules have significant potential for development of renewable energy-powered microgrids and are recommended for an electric company to replace traditional fixed pricing.Additionally,TOU monthly pricing is more suitable than TOU seasonal pricing for microgrid development.展开更多
The traveling salesman problem (TSP) is a classical optimization problem and it is one of a class of NP- Problem. This paper presents a new method named multiagent approach based genetic algorithm and ant colony sys...The traveling salesman problem (TSP) is a classical optimization problem and it is one of a class of NP- Problem. This paper presents a new method named multiagent approach based genetic algorithm and ant colony system to solve the TSP. Three kinds of agents with different function were designed in the multi-agent architecture proposed by this paper. The first kind of agent is ant colony optimization agent and its function is generating the new solution continuously. The second kind of agent is selection agent, crossover agent and mutation agent, their function is optimizing the current solutions group. The third kind of agent is fast local searching agent and its function is optimizing the best solution from the beginning of the trial. At the end of this paper, the experimental results have shown that the proposed hybrid ap proach has good performance with respect to the quality of solution and the speed of computation.展开更多
This paper studies the consensus problems for a group of agents with switching topology and time-varying communication delays, where the dynamics of agents is modeled as a high-order integrator. A linear distributed c...This paper studies the consensus problems for a group of agents with switching topology and time-varying communication delays, where the dynamics of agents is modeled as a high-order integrator. A linear distributed consensus protocol is proposed, which only depends on the agent's own information and its neighbors' partial information. By introducing a decomposition of the state vector and performing a state space transformation, the closed-loop dynamics of the multi-agent system is converted into two decoupled subsystems. Based on the decoupled subsystems, some sufficient conditions for the convergence to consensus are established, which provide the upper bounds on the admissible communication delays. Also, the explicit expression of the consensus state is derived. Moreover, the results on the consensus seeking of the group of high-order agents have been extended to a network of agents with dynamics modeled as a completely controllable linear time-invariant system. It is proved that the convergence to consensus of this network is equivalent to that of the group of high-order agents. Finally, some numerical examples are given to demonstrate the effectiveness of the main results.展开更多
Vocabulary is a key to English learning. How to memorize words effectively has become a heated topic among teachers and students. Since primary school is the initial stage of English learning, it is imperative to cult...Vocabulary is a key to English learning. How to memorize words effectively has become a heated topic among teachers and students. Since primary school is the initial stage of English learning, it is imperative to cultivate students' interest in English and find a useful way of learning vocabulary. The study aims to prove that game teaching approach as a commonly used teaching approach can motivate students' interest and assist English teaching if it is combined with vocabulary teaching properly. In the study, 70 students of Grade four were randomly chosen from Wenzhou Jingshan primary school as the subjects, class 1as the experimental group while Class 2 as the control one. The results show that game teaching approach applied in English vocabulary teaching can help students learn vocabulary more efficiently than traditional vocabulary teaching method.展开更多
Few multi-agent reinforcement learning (MARL) researches on Google research football (GRF) focus on the 11-vs-11 multi-agent full-game scenario and to the best of our knowledge, no open benchmark on this scenario has ...Few multi-agent reinforcement learning (MARL) researches on Google research football (GRF) focus on the 11-vs-11 multi-agent full-game scenario and to the best of our knowledge, no open benchmark on this scenario has been released to the public. In this work, we fill the gap by providing a population-based MARL training pipeline and hyperparameter settings on multi-agent football scenario that outperforms the bot with difficulty 1.0 from scratch within 2 million steps. Our experiments serve as a reference for the expected performance of independent proximal policy optimization (IPPO), a state-of-the-art multi-agent reinforcement learning algorithm where each agent tries to maximize its own policy independently across various training configurations. Meanwhile, we release our training framework Light-MALib which extends the MALib codebase by distributed and asynchronous implementation with additional analytical tools for football games. Finally, we provide guidance for building strong football AI with population-based training and release diverse pretrained policies for benchmarking. The goal is to provide the community with a head start for whoever experiment their works on GRF and a simple-to-use population-based training framework for further improving their agents through self-play. The implementation is available at https://github.com/Shanghai-Digital-Brain-Laboratory/DB-Football.展开更多
The recent proliferation of Fifth-Generation(5G)networks and Sixth-Generation(6G)networks has given rise to Vehicular Crowd Sensing(VCS)systems which solve parking collisions by effectively incentivizing vehicle parti...The recent proliferation of Fifth-Generation(5G)networks and Sixth-Generation(6G)networks has given rise to Vehicular Crowd Sensing(VCS)systems which solve parking collisions by effectively incentivizing vehicle participation.However,instead of being an isolated module,the incentive mechanism usually interacts with other modules.Based on this,we capture this synergy and propose a Collision-free Parking Recommendation(CPR),a novel VCS system framework that integrates an incentive mechanism,a non-cooperative VCS game,and a multi-agent reinforcement learning algorithm,to derive an optimal parking strategy in real time.Specifically,we utilize an LSTM method to predict parking areas roughly for recommendations accurately.Its incentive mechanism is designed to motivate vehicle participation by considering dynamically priced parking tasks and social network effects.In order to cope with stochastic parking collisions,its non-cooperative VCS game further analyzes the uncertain interactions between vehicles in parking decision-making.Then its multi-agent reinforcement learning algorithm models the VCS campaign as a multi-agent Markov decision process that not only derives the optimal collision-free parking strategy for each vehicle independently,but also proves that the optimal parking strategy for each vehicle is Pareto-optimal.Finally,numerical results demonstrate that CPR can accomplish parking tasks at a 99.7%accuracy compared with other baselines,efficiently recommending parking spaces.展开更多
This paper consists of two parts. The first part introduces the strict aspiration as a new aspiration solution concept, which is provedto be existent for any cooperative game. The second part deals with theunsolved p...This paper consists of two parts. The first part introduces the strict aspiration as a new aspiration solution concept, which is provedto be existent for any cooperative game. The second part deals with theunsolved problem put forward by Bennett by showing that there is atleast one payoff which is balanced, partnered and equal gains aspiration.The proof is algebraic and constructive, thus providing an algorithm forfinding such aspirations.展开更多
Powered by the Internet and the ever-increasing level of informatization, the cyberspace has become increasingly complex and its security situation has become increasingly grim, which requires new adaptive and collabo...Powered by the Internet and the ever-increasing level of informatization, the cyberspace has become increasingly complex and its security situation has become increasingly grim, which requires new adaptive and collaborative defense technologies. In this paper, we introduced an extended interactive multi-agent decision model for decentralized cyber defense. Based on the significant advantages of the cooperative multi-agent decision-making, the decentralized interactive decision model DI-MDPs and the corresponding interaction and retrieval algorithms are proposed. Then, we analyzed the interactive decision by the calculation and update processes of three matrices, the stability and evolutionary equilibrium of the proposed model are also analyzed. Finally, we evaluated the performance of the proposed algorithms based on open data sets and standard test environments, the experimental results shown that the proposed work will be more applicable in cyber defense.展开更多
Adding a reputation incentive system to peer-to-peer(P2P)energy transactions can encourage prosumers to regulate their trading behavior,which is important for ensuring the efficiency and reliability of P2P transaction...Adding a reputation incentive system to peer-to-peer(P2P)energy transactions can encourage prosumers to regulate their trading behavior,which is important for ensuring the efficiency and reliability of P2P transactions.This study proposed a P2P transaction mechanism and game optimization model for prosumers involved in distributed energy sources considering reputation-value incentives.First,the deviation of P2P transactions and the non-consumption rate of distributed renewable energy in P2P transactions were established as indicators to quantify the influencing factors of the reputation value,and a reputation incentive model of P2P transactions for prosumers was constructed.Then,the penalty coefficient was applied to the cost function of the prosumers,and a non-cooperative game model of P2P transactions based on the complete information of multi-prosumers was established.Furthermore,the Nash equilibrium problem was transformed into a nonlinear optimization problem by constructing the modified optimal reaction function,and the Nash equilibrium solution of the game was obtained via a relaxation algorithm.Finally,the modified IEEE 33-node test system based on electricity market P2P and an IEEE 123-node test system were used to analyze and verify the cost and P2P participation of prosumers considering the reputation value.The results show that the addition of the reputation incentive system can encourage prosumers to standardize their interactive transaction behavior and actively participate in P2P transactions.It can also improve the operation efficiency of the power grid and promote the perfection of the P2P transaction mechanism.展开更多
基金Ministry of Education,Singapore,under AcRF TIER 1 Grant RG64/23the Eric and Wendy Schmidt AI in Science Postdoctoral Fellowship,a Schmidt Futures program,USA.
文摘Multi-agent reinforcement learning(MARL)has been a rapidly evolving field.This paper presents a comprehensive survey of MARL and its applications.We trace the historical evolution of MARL,highlight its progress,and discuss related survey works.Then,we review the existing works addressing inherent challenges and those focusing on diverse applications.Some representative stochastic games,MARL means,spatial forms of MARL,and task classification are revisited.We then conduct an in-depth exploration of a variety of challenges encountered in MARL applications.We also address critical operational aspects,such as hyperparameter tuning and computational complexity,which are pivotal in practical implementations of MARL.Afterward,we make a thorough overview of the applications of MARL to intelligent machines and devices,chemical engineering,biotechnology,healthcare,and societal issues,which highlights the extensive potential and relevance of MARL within both current and future technological contexts.Our survey also encompasses a detailed examination of benchmark environments used in MARL research,which are instrumental in evaluating MARL algorithms and demonstrate the adaptability of MARL to diverse application scenarios.In the end,we give our prospect for MARL and discuss their related techniques and potential future applications.
基金supported by the National Natural Science Foundation of China(61503407,61806219,61703426,61876189,61703412)the China Postdoctoral Science Foundation(2016 M602996)。
文摘The multi-agent system is the optimal solution to complex intelligent problems. In accordance with the game theory, the concept of loyalty is introduced to analyze the relationship between agents' individual income and global benefits and build the logical architecture of the multi-agent system. Besides, to verify the feasibility of the method, the cyclic neural network is optimized, the bi-directional coordination network is built as the training network for deep learning, and specific training scenes are simulated as the training background. After a certain number of training iterations, the model can learn simple strategies autonomously. Also,as the training time increases, the complexity of learning strategies rises gradually. Strategies such as obstacle avoidance, firepower distribution and collaborative cover are adopted to demonstrate the achievability of the model. The model is verified to be realizable by the examples of obstacle avoidance, fire distribution and cooperative cover. Under the same resource background, the model exhibits better convergence than other deep learning training networks, and it is not easy to fall into the local endless loop.Furthermore, the ability of the learning strategy is stronger than that of the training model based on rules, which is of great practical values.
基金supported in part by the National Natural Science Foundation of China(U20A2068, 11771013)Zhejiang Provincial Natural Science Foundation of China (LD19A010001)。
文摘The secure dominating set(SDS),a variant of the dominating set,is an important combinatorial structure used in wireless networks.In this paper,we apply algorithmic game theory to study the minimum secure dominating set(Min SDS) problem in a multi-agent system.We design a game framework for SDS and show that every Nash equilibrium(NE) is a minimal SDS,which is also a Pareto-optimal solution.We prove that the proposed game is an exact potential game,and thus NE exists,and design a polynomial-time distributed local algorithm which converges to an NE in O(n) rounds of interactions.Extensive experiments are done to test the performance of our algorithm,and some interesting phenomena are witnessed.
基金supported by the National Key R&D Program of China(2017YFB1400105).
文摘In the evolutionary game of the same task for groups,the changes in game rules,personal interests,the crowd size,and external supervision cause uncertain effects on individual decision-making and game results.In the Markov decision framework,a single-task multi-decision evolutionary game model based on multi-agent reinforcement learning is proposed to explore the evolutionary rules in the process of a game.The model can improve the result of a evolutionary game and facilitate the completion of the task.First,based on the multi-agent theory,to solve the existing problems in the original model,a negative feedback tax penalty mechanism is proposed to guide the strategy selection of individuals in the group.In addition,in order to evaluate the evolutionary game results of the group in the model,a calculation method of the group intelligence level is defined.Secondly,the Q-learning algorithm is used to improve the guiding effect of the negative feedback tax penalty mechanism.In the model,the selection strategy of the Q-learning algorithm is improved and a bounded rationality evolutionary game strategy is proposed based on the rule of evolutionary games and the consideration of the bounded rationality of individuals.Finally,simulation results show that the proposed model can effectively guide individuals to choose cooperation strategies which are beneficial to task completion and stability under different negative feedback factor values and different group sizes,so as to improve the group intelligence level.
基金supported by the National Natural Science Foundation of China(52277107,51977115)Shenzhen Science and Technology Innovation Program(WDZC20220808143010001).
文摘While price schedules can help improve the economic efficiency of renewable energy-powered microgrids,timeof-use(TOU)pricing has been identified as an effective way for microgrid development,which is presently limited by its high costs.In this study,we propose an evolutionary game theoretic model to explore optimal TOU pricing for development of renewable energy-powered microgrids by applying a multi-agent system,that comprises a government agent,local utility company agent,and different types of consumer agents.In the proposed model,we design objective functions for the company and the consumers and obtain a Nash equilibrium using backward induction.Two pricing strategies,namely,the TOU seasonal pricing and TOU monthly pricing,are evaluated and compared with traditional fixed pricing.The numerical results demonstrate that TOU schedules have significant potential for development of renewable energy-powered microgrids and are recommended for an electric company to replace traditional fixed pricing.Additionally,TOU monthly pricing is more suitable than TOU seasonal pricing for microgrid development.
基金Supported by the National Natural Science Foun-dation of China (69973016)
文摘The traveling salesman problem (TSP) is a classical optimization problem and it is one of a class of NP- Problem. This paper presents a new method named multiagent approach based genetic algorithm and ant colony system to solve the TSP. Three kinds of agents with different function were designed in the multi-agent architecture proposed by this paper. The first kind of agent is ant colony optimization agent and its function is generating the new solution continuously. The second kind of agent is selection agent, crossover agent and mutation agent, their function is optimizing the current solutions group. The third kind of agent is fast local searching agent and its function is optimizing the best solution from the beginning of the trial. At the end of this paper, the experimental results have shown that the proposed hybrid ap proach has good performance with respect to the quality of solution and the speed of computation.
基金supported by the National Natural Science Foundation of China(No.60674050,60736022,10972002,60774089,60704039)
文摘This paper studies the consensus problems for a group of agents with switching topology and time-varying communication delays, where the dynamics of agents is modeled as a high-order integrator. A linear distributed consensus protocol is proposed, which only depends on the agent's own information and its neighbors' partial information. By introducing a decomposition of the state vector and performing a state space transformation, the closed-loop dynamics of the multi-agent system is converted into two decoupled subsystems. Based on the decoupled subsystems, some sufficient conditions for the convergence to consensus are established, which provide the upper bounds on the admissible communication delays. Also, the explicit expression of the consensus state is derived. Moreover, the results on the consensus seeking of the group of high-order agents have been extended to a network of agents with dynamics modeled as a completely controllable linear time-invariant system. It is proved that the convergence to consensus of this network is equivalent to that of the group of high-order agents. Finally, some numerical examples are given to demonstrate the effectiveness of the main results.
文摘Vocabulary is a key to English learning. How to memorize words effectively has become a heated topic among teachers and students. Since primary school is the initial stage of English learning, it is imperative to cultivate students' interest in English and find a useful way of learning vocabulary. The study aims to prove that game teaching approach as a commonly used teaching approach can motivate students' interest and assist English teaching if it is combined with vocabulary teaching properly. In the study, 70 students of Grade four were randomly chosen from Wenzhou Jingshan primary school as the subjects, class 1as the experimental group while Class 2 as the control one. The results show that game teaching approach applied in English vocabulary teaching can help students learn vocabulary more efficiently than traditional vocabulary teaching method.
基金supported by the National Natural Science Foundation of China(No.62206289).
文摘Few multi-agent reinforcement learning (MARL) researches on Google research football (GRF) focus on the 11-vs-11 multi-agent full-game scenario and to the best of our knowledge, no open benchmark on this scenario has been released to the public. In this work, we fill the gap by providing a population-based MARL training pipeline and hyperparameter settings on multi-agent football scenario that outperforms the bot with difficulty 1.0 from scratch within 2 million steps. Our experiments serve as a reference for the expected performance of independent proximal policy optimization (IPPO), a state-of-the-art multi-agent reinforcement learning algorithm where each agent tries to maximize its own policy independently across various training configurations. Meanwhile, we release our training framework Light-MALib which extends the MALib codebase by distributed and asynchronous implementation with additional analytical tools for football games. Finally, we provide guidance for building strong football AI with population-based training and release diverse pretrained policies for benchmarking. The goal is to provide the community with a head start for whoever experiment their works on GRF and a simple-to-use population-based training framework for further improving their agents through self-play. The implementation is available at https://github.com/Shanghai-Digital-Brain-Laboratory/DB-Football.
基金supported in part by the Natural Science Foundation of Shandong Province of China(ZR202103040180)the Major Scientific and Technological Projects of CNPC under Grant ZD2019-183-004the Fundamental Research Funds for the Central Universities under Grant 20CX05019A.
文摘The recent proliferation of Fifth-Generation(5G)networks and Sixth-Generation(6G)networks has given rise to Vehicular Crowd Sensing(VCS)systems which solve parking collisions by effectively incentivizing vehicle participation.However,instead of being an isolated module,the incentive mechanism usually interacts with other modules.Based on this,we capture this synergy and propose a Collision-free Parking Recommendation(CPR),a novel VCS system framework that integrates an incentive mechanism,a non-cooperative VCS game,and a multi-agent reinforcement learning algorithm,to derive an optimal parking strategy in real time.Specifically,we utilize an LSTM method to predict parking areas roughly for recommendations accurately.Its incentive mechanism is designed to motivate vehicle participation by considering dynamically priced parking tasks and social network effects.In order to cope with stochastic parking collisions,its non-cooperative VCS game further analyzes the uncertain interactions between vehicles in parking decision-making.Then its multi-agent reinforcement learning algorithm models the VCS campaign as a multi-agent Markov decision process that not only derives the optimal collision-free parking strategy for each vehicle independently,but also proves that the optimal parking strategy for each vehicle is Pareto-optimal.Finally,numerical results demonstrate that CPR can accomplish parking tasks at a 99.7%accuracy compared with other baselines,efficiently recommending parking spaces.
文摘This paper consists of two parts. The first part introduces the strict aspiration as a new aspiration solution concept, which is provedto be existent for any cooperative game. The second part deals with theunsolved problem put forward by Bennett by showing that there is atleast one payoff which is balanced, partnered and equal gains aspiration.The proof is algebraic and constructive, thus providing an algorithm forfinding such aspirations.
基金financially supported by the National Natural Science Foundation of China (No. 62106060)in part by the Beijing Natural Science Foundation (No. 4214061)
文摘Powered by the Internet and the ever-increasing level of informatization, the cyberspace has become increasingly complex and its security situation has become increasingly grim, which requires new adaptive and collaborative defense technologies. In this paper, we introduced an extended interactive multi-agent decision model for decentralized cyber defense. Based on the significant advantages of the cooperative multi-agent decision-making, the decentralized interactive decision model DI-MDPs and the corresponding interaction and retrieval algorithms are proposed. Then, we analyzed the interactive decision by the calculation and update processes of three matrices, the stability and evolutionary equilibrium of the proposed model are also analyzed. Finally, we evaluated the performance of the proposed algorithms based on open data sets and standard test environments, the experimental results shown that the proposed work will be more applicable in cyber defense.
基金supported by the National Natural Science Foundation of China(U2066211,52177124,52107134)the Institute of Electrical Engineering,CAS(E155610101)+1 种基金the DNL Cooperation Fund,CAS(DNL202023)the Youth Innovation Promotion Association of CAS(2019143).
文摘Adding a reputation incentive system to peer-to-peer(P2P)energy transactions can encourage prosumers to regulate their trading behavior,which is important for ensuring the efficiency and reliability of P2P transactions.This study proposed a P2P transaction mechanism and game optimization model for prosumers involved in distributed energy sources considering reputation-value incentives.First,the deviation of P2P transactions and the non-consumption rate of distributed renewable energy in P2P transactions were established as indicators to quantify the influencing factors of the reputation value,and a reputation incentive model of P2P transactions for prosumers was constructed.Then,the penalty coefficient was applied to the cost function of the prosumers,and a non-cooperative game model of P2P transactions based on the complete information of multi-prosumers was established.Furthermore,the Nash equilibrium problem was transformed into a nonlinear optimization problem by constructing the modified optimal reaction function,and the Nash equilibrium solution of the game was obtained via a relaxation algorithm.Finally,the modified IEEE 33-node test system based on electricity market P2P and an IEEE 123-node test system were used to analyze and verify the cost and P2P participation of prosumers considering the reputation value.The results show that the addition of the reputation incentive system can encourage prosumers to standardize their interactive transaction behavior and actively participate in P2P transactions.It can also improve the operation efficiency of the power grid and promote the perfection of the P2P transaction mechanism.