期刊文献+
共找到87篇文章
< 1 2 5 >
每页显示 20 50 100
UAV-Assisted Dynamic Avatar Task Migration for Vehicular Metaverse Services: A Multi-Agent Deep Reinforcement Learning Approach 被引量:1
1
作者 Jiawen Kang Junlong Chen +6 位作者 Minrui Xu Zehui Xiong Yutao Jiao Luchao Han Dusit Niyato Yongju Tong Shengli Xie 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2024年第2期430-445,共16页
Avatars, as promising digital representations and service assistants of users in Metaverses, can enable drivers and passengers to immerse themselves in 3D virtual services and spaces of UAV-assisted vehicular Metavers... Avatars, as promising digital representations and service assistants of users in Metaverses, can enable drivers and passengers to immerse themselves in 3D virtual services and spaces of UAV-assisted vehicular Metaverses. However, avatar tasks include a multitude of human-to-avatar and avatar-to-avatar interactive applications, e.g., augmented reality navigation,which consumes intensive computing resources. It is inefficient and impractical for vehicles to process avatar tasks locally. Fortunately, migrating avatar tasks to the nearest roadside units(RSU)or unmanned aerial vehicles(UAV) for execution is a promising solution to decrease computation overhead and reduce task processing latency, while the high mobility of vehicles brings challenges for vehicles to independently perform avatar migration decisions depending on current and future vehicle status. To address these challenges, in this paper, we propose a novel avatar task migration system based on multi-agent deep reinforcement learning(MADRL) to execute immersive vehicular avatar tasks dynamically. Specifically, we first formulate the problem of avatar task migration from vehicles to RSUs/UAVs as a partially observable Markov decision process that can be solved by MADRL algorithms. We then design the multi-agent proximal policy optimization(MAPPO) approach as the MADRL algorithm for the avatar task migration problem. To overcome slow convergence resulting from the curse of dimensionality and non-stationary issues caused by shared parameters in MAPPO, we further propose a transformer-based MAPPO approach via sequential decision-making models for the efficient representation of relationships among agents. Finally, to motivate terrestrial or non-terrestrial edge servers(e.g., RSUs or UAVs) to share computation resources and ensure traceability of the sharing records, we apply smart contracts and blockchain technologies to achieve secure sharing management. Numerical results demonstrate that the proposed approach outperforms the MAPPO approach by around 2% and effectively reduces approximately 20% of the latency of avatar task execution in UAV-assisted vehicular Metaverses. 展开更多
关键词 AVATAR blockchain metaverses multi-agent deep reinforcement learning transformer UAVS
下载PDF
Discovering Latent Variables for the Tasks With Confounders in Multi-Agent Reinforcement Learning
2
作者 Kun Jiang Wenzhang Liu +2 位作者 Yuanda Wang Lu Dong Changyin Sun 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2024年第7期1591-1604,共14页
Efficient exploration in complex coordination tasks has been considered a challenging problem in multi-agent reinforcement learning(MARL). It is significantly more difficult for those tasks with latent variables that ... Efficient exploration in complex coordination tasks has been considered a challenging problem in multi-agent reinforcement learning(MARL). It is significantly more difficult for those tasks with latent variables that agents cannot directly observe. However, most of the existing latent variable discovery methods lack a clear representation of latent variables and an effective evaluation of the influence of latent variables on the agent. In this paper, we propose a new MARL algorithm based on the soft actor-critic method for complex continuous control tasks with confounders. It is called the multi-agent soft actor-critic with latent variable(MASAC-LV) algorithm, which uses variational inference theory to infer the compact latent variables representation space from a large amount of offline experience.Besides, we derive the counterfactual policy whose input has no latent variables and quantify the difference between the actual policy and the counterfactual policy via a distance function. This quantified difference is considered an intrinsic motivation that gives additional rewards based on how much the latent variable affects each agent. The proposed algorithm is evaluated on two collaboration tasks with confounders, and the experimental results demonstrate the effectiveness of MASAC-LV compared to other baseline algorithms. 展开更多
关键词 Latent variable model maximum entropy multi-agent reinforcement learning(marl) multi-agent system
下载PDF
Safety-Constrained Multi-Agent Reinforcement Learning for Power Quality Control in Distributed Renewable Energy Networks
3
作者 Yongjiang Zhao Haoyi Zhong Chang Cyoon Lim 《Computers, Materials & Continua》 SCIE EI 2024年第4期449-471,共23页
This paper examines the difficulties of managing distributed power systems,notably due to the increasing use of renewable energy sources,and focuses on voltage control challenges exacerbated by their variable nature i... This paper examines the difficulties of managing distributed power systems,notably due to the increasing use of renewable energy sources,and focuses on voltage control challenges exacerbated by their variable nature in modern power grids.To tackle the unique challenges of voltage control in distributed renewable energy networks,researchers are increasingly turning towards multi-agent reinforcement learning(MARL).However,MARL raises safety concerns due to the unpredictability in agent actions during their exploration phase.This unpredictability can lead to unsafe control measures.To mitigate these safety concerns in MARL-based voltage control,our study introduces a novel approach:Safety-ConstrainedMulti-Agent Reinforcement Learning(SC-MARL).This approach incorporates a specialized safety constraint module specifically designed for voltage control within the MARL framework.This module ensures that the MARL agents carry out voltage control actions safely.The experiments demonstrate that,in the 33-buses,141-buses,and 322-buses power systems,employing SC-MARL for voltage control resulted in a reduction of the Voltage Out of Control Rate(%V.out)from0.43,0.24,and 2.95 to 0,0.01,and 0.03,respectively.Additionally,the Reactive Power Loss(Q loss)decreased from 0.095,0.547,and 0.017 to 0.062,0.452,and 0.016 in the corresponding systems. 展开更多
关键词 Power quality control multi-agent reinforcement learning safety-constrained marl
下载PDF
Unleashing the Power of Multi-Agent Reinforcement Learning for Algorithmic Trading in the Digital Financial Frontier and Enterprise Information Systems
4
作者 Saket Sarin Sunil K.Singh +4 位作者 Sudhakar Kumar Shivam Goyal Brij Bhooshan Gupta Wadee Alhalabi Varsha Arya 《Computers, Materials & Continua》 SCIE EI 2024年第8期3123-3138,共16页
In the rapidly evolving landscape of today’s digital economy,Financial Technology(Fintech)emerges as a trans-formative force,propelled by the dynamic synergy between Artificial Intelligence(AI)and Algorithmic Trading... In the rapidly evolving landscape of today’s digital economy,Financial Technology(Fintech)emerges as a trans-formative force,propelled by the dynamic synergy between Artificial Intelligence(AI)and Algorithmic Trading.Our in-depth investigation delves into the intricacies of merging Multi-Agent Reinforcement Learning(MARL)and Explainable AI(XAI)within Fintech,aiming to refine Algorithmic Trading strategies.Through meticulous examination,we uncover the nuanced interactions of AI-driven agents as they collaborate and compete within the financial realm,employing sophisticated deep learning techniques to enhance the clarity and adaptability of trading decisions.These AI-infused Fintech platforms harness collective intelligence to unearth trends,mitigate risks,and provide tailored financial guidance,fostering benefits for individuals and enterprises navigating the digital landscape.Our research holds the potential to revolutionize finance,opening doors to fresh avenues for investment and asset management in the digital age.Additionally,our statistical evaluation yields encouraging results,with metrics such as Accuracy=0.85,Precision=0.88,and F1 Score=0.86,reaffirming the efficacy of our approach within Fintech and emphasizing its reliability and innovative prowess. 展开更多
关键词 Neurodynamic Fintech multi-agent reinforcement learning algorithmic trading digital financial frontier
下载PDF
Regional Multi-Agent Cooperative Reinforcement Learning for City-Level Traffic Grid Signal
5
作者 Yisha Li Ya Zhang +1 位作者 Xinde Li Changyin Sun 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2024年第9期1987-1998,共12页
This article studies the effective traffic signal control problem of multiple intersections in a city-level traffic system.A novel regional multi-agent cooperative reinforcement learning algorithm called RegionSTLight... This article studies the effective traffic signal control problem of multiple intersections in a city-level traffic system.A novel regional multi-agent cooperative reinforcement learning algorithm called RegionSTLight is proposed to improve the traffic efficiency.Firstly a regional multi-agent Q-learning framework is proposed,which can equivalently decompose the global Q value of the traffic system into the local values of several regions Based on the framework and the idea of human-machine cooperation,a dynamic zoning method is designed to divide the traffic network into several strong-coupled regions according to realtime traffic flow densities.In order to achieve better cooperation inside each region,a lightweight spatio-temporal fusion feature extraction network is designed.The experiments in synthetic real-world and city-level scenarios show that the proposed RegionS TLight converges more quickly,is more stable,and obtains better asymptotic performance compared to state-of-theart models. 展开更多
关键词 Human-machine cooperation mixed domain attention mechanism multi-agent reinforcement learning spatio-temporal feature traffic signal control
下载PDF
A survey on multi-agent reinforcement learning and its application
6
作者 Zepeng Ning Lihua Xie 《Journal of Automation and Intelligence》 2024年第2期73-91,共19页
Multi-agent reinforcement learning(MARL)has been a rapidly evolving field.This paper presents a comprehensive survey of MARL and its applications.We trace the historical evolution of MARL,highlight its progress,and di... Multi-agent reinforcement learning(MARL)has been a rapidly evolving field.This paper presents a comprehensive survey of MARL and its applications.We trace the historical evolution of MARL,highlight its progress,and discuss related survey works.Then,we review the existing works addressing inherent challenges and those focusing on diverse applications.Some representative stochastic games,MARL means,spatial forms of MARL,and task classification are revisited.We then conduct an in-depth exploration of a variety of challenges encountered in MARL applications.We also address critical operational aspects,such as hyperparameter tuning and computational complexity,which are pivotal in practical implementations of MARL.Afterward,we make a thorough overview of the applications of MARL to intelligent machines and devices,chemical engineering,biotechnology,healthcare,and societal issues,which highlights the extensive potential and relevance of MARL within both current and future technological contexts.Our survey also encompasses a detailed examination of benchmark environments used in MARL research,which are instrumental in evaluating MARL algorithms and demonstrate the adaptability of MARL to diverse application scenarios.In the end,we give our prospect for MARL and discuss their related techniques and potential future applications. 展开更多
关键词 Benchmark environments multi-agent reinforcement learning multi-agent systems Stochastic games
下载PDF
Performance Evaluation ofMulti-Agent Reinforcement Learning Algorithms
7
作者 Abdulghani M.Abdulghani Mokhles M.Abdulghani +1 位作者 Wilbur L.Walters Khalid H.Abed 《Intelligent Automation & Soft Computing》 2024年第2期337-352,共16页
Multi-Agent Reinforcement Learning(MARL)has proven to be successful in cooperative assignments.MARL is used to investigate how autonomous agents with the same interests can connect and act in one team.MARL cooperation... Multi-Agent Reinforcement Learning(MARL)has proven to be successful in cooperative assignments.MARL is used to investigate how autonomous agents with the same interests can connect and act in one team.MARL cooperation scenarios are explored in recreational cooperative augmented reality environments,as well as realworld scenarios in robotics.In this paper,we explore the realm of MARL and its potential applications in cooperative assignments.Our focus is on developing a multi-agent system that can collaborate to attack or defend against enemies and achieve victory withminimal damage.To accomplish this,we utilize the StarCraftMulti-Agent Challenge(SMAC)environment and train four MARL algorithms:Q-learning with Mixtures of Experts(QMIX),Value-DecompositionNetwork(VDN),Multi-agent Proximal PolicyOptimizer(MAPPO),andMulti-Agent Actor Attention Critic(MAA2C).These algorithms allow multiple agents to cooperate in a specific scenario to achieve the targeted mission.Our results show that the QMIX algorithm outperforms the other three algorithms in the attacking scenario,while the VDN algorithm achieves the best results in the defending scenario.Specifically,the VDNalgorithmreaches the highest value of battle wonmean and the lowest value of dead alliesmean.Our research demonstrates the potential forMARL algorithms to be used in real-world applications,such as controllingmultiple robots to provide helpful services or coordinating teams of agents to accomplish tasks that would be impossible for a human to do.The SMAC environment provides a unique opportunity to test and evaluate MARL algorithms in a challenging and dynamic environment,and our results show that these algorithms can be used to achieve victory with minimal damage. 展开更多
关键词 reinforcement learning RL multi-agent marl SMAC VDN QMIX MAPPO
下载PDF
An Optimal Control-Based Distributed Reinforcement Learning Framework for A Class of Non-Convex Objective Functionals of the Multi-Agent Network 被引量:2
8
作者 Zhe Chen Ning Li 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2023年第11期2081-2093,共13页
This paper studies a novel distributed optimization problem that aims to minimize the sum of the non-convex objective functionals of the multi-agent network under privacy protection, which means that the local objecti... This paper studies a novel distributed optimization problem that aims to minimize the sum of the non-convex objective functionals of the multi-agent network under privacy protection, which means that the local objective of each agent is unknown to others. The above problem involves complexity simultaneously in the time and space aspects. Yet existing works about distributed optimization mainly consider privacy protection in the space aspect where the decision variable is a vector with finite dimensions. In contrast, when the time aspect is considered in this paper, the decision variable is a continuous function concerning time. Hence, the minimization of the overall functional belongs to the calculus of variations. Traditional works usually aim to seek the optimal decision function. Due to privacy protection and non-convexity, the Euler-Lagrange equation of the proposed problem is a complicated partial differential equation.Hence, we seek the optimal decision derivative function rather than the decision function. This manner can be regarded as seeking the control input for an optimal control problem, for which we propose a centralized reinforcement learning(RL) framework. In the space aspect, we further present a distributed reinforcement learning framework to deal with the impact of privacy protection. Finally, rigorous theoretical analysis and simulation validate the effectiveness of our framework. 展开更多
关键词 Distributed optimization multi-agent optimal control reinforcement learning(RL)
下载PDF
Multi-Agent Deep Reinforcement Learning for Efficient Computation Offloading in Mobile Edge Computing
9
作者 Tianzhe Jiao Xiaoyue Feng +2 位作者 Chaopeng Guo Dongqi Wang Jie Song 《Computers, Materials & Continua》 SCIE EI 2023年第9期3585-3603,共19页
Mobile-edge computing(MEC)is a promising technology for the fifth-generation(5G)and sixth-generation(6G)architectures,which provides resourceful computing capabilities for Internet of Things(IoT)devices,such as virtua... Mobile-edge computing(MEC)is a promising technology for the fifth-generation(5G)and sixth-generation(6G)architectures,which provides resourceful computing capabilities for Internet of Things(IoT)devices,such as virtual reality,mobile devices,and smart cities.In general,these IoT applications always bring higher energy consumption than traditional applications,which are usually energy-constrained.To provide persistent energy,many references have studied the offloading problem to save energy consumption.However,the dynamic environment dramatically increases the optimization difficulty of the offloading decision.In this paper,we aim to minimize the energy consumption of the entireMECsystemunder the latency constraint by fully considering the dynamic environment.UnderMarkov games,we propose amulti-agent deep reinforcement learning approach based on the bi-level actorcritic learning structure to jointly optimize the offloading decision and resource allocation,which can solve the combinatorial optimization problem using an asymmetric method and compute the Stackelberg equilibrium as a better convergence point than Nash equilibrium in terms of Pareto superiority.Our method can better adapt to a dynamic environment during the data transmission than the single-agent strategy and can effectively tackle the coordination problem in the multi-agent environment.The simulation results show that the proposed method could decrease the total computational overhead by 17.8%compared to the actor-critic-based method and reduce the total computational overhead by 31.3%,36.5%,and 44.7%compared with randomoffloading,all local execution,and all offloading execution,respectively. 展开更多
关键词 Computation offloading multi-agent deep reinforcement learning mobile-edge computing latency energy efficiency
下载PDF
Cooperative multi-target hunting by unmanned surface vehicles based on multi-agent reinforcement learning
10
作者 Jiawei Xia Yasong Luo +3 位作者 Zhikun Liu Yalun Zhang Haoran Shi Zhong Liu 《Defence Technology(防务技术)》 SCIE EI CAS CSCD 2023年第11期80-94,共15页
To solve the problem of multi-target hunting by an unmanned surface vehicle(USV)fleet,a hunting algorithm based on multi-agent reinforcement learning is proposed.Firstly,the hunting environment and kinematic model wit... To solve the problem of multi-target hunting by an unmanned surface vehicle(USV)fleet,a hunting algorithm based on multi-agent reinforcement learning is proposed.Firstly,the hunting environment and kinematic model without boundary constraints are built,and the criteria for successful target capture are given.Then,the cooperative hunting problem of a USV fleet is modeled as a decentralized partially observable Markov decision process(Dec-POMDP),and a distributed partially observable multitarget hunting Proximal Policy Optimization(DPOMH-PPO)algorithm applicable to USVs is proposed.In addition,an observation model,a reward function and the action space applicable to multi-target hunting tasks are designed.To deal with the dynamic change of observational feature dimension input by partially observable systems,a feature embedding block is proposed.By combining the two feature compression methods of column-wise max pooling(CMP)and column-wise average-pooling(CAP),observational feature encoding is established.Finally,the centralized training and decentralized execution framework is adopted to complete the training of hunting strategy.Each USV in the fleet shares the same policy and perform actions independently.Simulation experiments have verified the effectiveness of the DPOMH-PPO algorithm in the test scenarios with different numbers of USVs.Moreover,the advantages of the proposed model are comprehensively analyzed from the aspects of algorithm performance,migration effect in task scenarios and self-organization capability after being damaged,the potential deployment and application of DPOMH-PPO in the real environment is verified. 展开更多
关键词 Unmanned surface vehicles multi-agent deep reinforcement learning Cooperative hunting Feature embedding Proximal policy optimization
下载PDF
Privacy Preserving Demand Side Management Method via Multi-Agent Reinforcement Learning
11
作者 Feiye Zhang Qingyu Yang Dou An 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2023年第10期1984-1999,共16页
The smart grid utilizes the demand side management technology to motivate energy users towards cutting demand during peak power consumption periods, which greatly improves the operation efficiency of the power grid. H... The smart grid utilizes the demand side management technology to motivate energy users towards cutting demand during peak power consumption periods, which greatly improves the operation efficiency of the power grid. However, as the number of energy users participating in the smart grid continues to increase, the demand side management strategy of individual agent is greatly affected by the dynamic strategies of other agents. In addition, the existing demand side management methods, which need to obtain users’ power consumption information,seriously threaten the users’ privacy. To address the dynamic issue in the multi-microgrid demand side management model, a novel multi-agent reinforcement learning method based on centralized training and decentralized execution paradigm is presented to mitigate the damage of training performance caused by the instability of training experience. In order to protect users’ privacy, we design a neural network with fixed parameters as the encryptor to transform the users’ energy consumption information from low-dimensional to high-dimensional and theoretically prove that the proposed encryptor-based privacy preserving method will not affect the convergence property of the reinforcement learning algorithm. We verify the effectiveness of the proposed demand side management scheme with the real-world energy consumption data of Xi’an, Shaanxi, China. Simulation results show that the proposed method can effectively improve users’ satisfaction while reducing the bill payment compared with traditional reinforcement learning(RL) methods(i.e., deep Q learning(DQN), deep deterministic policy gradient(DDPG),QMIX and multi-agent deep deterministic policy gradient(MADDPG)). The results also demonstrate that the proposed privacy protection scheme can effectively protect users’ privacy while ensuring the performance of the algorithm. 展开更多
关键词 Centralized training and decentralized execution demand side management multi-agent reinforcement learning privacy preserving
下载PDF
UAV Frequency-based Crowdsensing Using Grouping Multi-agent Deep Reinforcement Learning
12
作者 Cui ZHANG En WANG +2 位作者 Funing YANG Yong jian YANG Nan JIANG 《计算机科学》 CSCD 北大核心 2023年第2期57-68,共12页
Mobile CrowdSensing(MCS)is a promising sensing paradigm that recruits users to cooperatively perform sensing tasks.Recently,unmanned aerial vehicles(UAVs)as the powerful sensing devices are used to replace user partic... Mobile CrowdSensing(MCS)is a promising sensing paradigm that recruits users to cooperatively perform sensing tasks.Recently,unmanned aerial vehicles(UAVs)as the powerful sensing devices are used to replace user participation and carry out some special tasks,such as epidemic monitoring and earthquakes rescue.In this paper,we focus on scheduling UAVs to sense the task Point-of-Interests(PoIs)with different frequency coverage requirements.To accomplish the sensing task,the scheduling strategy needs to consider the coverage requirement,geographic fairness and energy charging simultaneously.We consider the complex interaction among UAVs and propose a grouping multi-agent deep reinforcement learning approach(G-MADDPG)to schedule UAVs distributively.G-MADDPG groups all UAVs into some teams by a distance-based clustering algorithm(DCA),then it regards each team as an agent.In this way,G-MADDPG solves the problem that the training time of traditional MADDPG is too long to converge when the number of UAVs is large,and the trade-off between training time and result accuracy could be controlled flexibly by adjusting the number of teams.Extensive simulation results show that our scheduling strategy has better performance compared with three baselines and is flexible in balancing training time and result accuracy. 展开更多
关键词 UAV Crowdsensing Frequency coverage Grouping multi-agent deep reinforcement learning
下载PDF
Multi-Agent Dynamic Area Coverage Based on Reinforcement Learning with Connected Agents
13
作者 Fatih Aydemir Aydin Cetin 《Computer Systems Science & Engineering》 SCIE EI 2023年第4期215-230,共16页
Dynamic area coverage with small unmanned aerial vehicle(UAV)systems is one of the major research topics due to limited payloads and the difficulty of decentralized decision-making process.Collaborative behavior of a ... Dynamic area coverage with small unmanned aerial vehicle(UAV)systems is one of the major research topics due to limited payloads and the difficulty of decentralized decision-making process.Collaborative behavior of a group of UAVs in an unknown environment is another hard problem to be solved.In this paper,we propose a method for decentralized execution of multi-UAVs for dynamic area coverage problems.The proposed decentralized decision-making dynamic area coverage(DDMDAC)method utilizes reinforcement learning(RL)where each UAV is represented by an intelligent agent that learns policies to create collaborative behaviors in partially observable environment.Intelligent agents increase their global observations by gathering information about the environment by connecting with other agents.The connectivity provides a consensus for the decision-making process,while each agent takes decisions.At each step,agents acquire all reachable agents’states,determine the optimum location for maximal area coverage and receive reward using the covered rate on the target area,respectively.The method was tested in a multi-agent actor-critic simulation platform.In the study,it has been considered that each UAV has a certain communication distance as in real applications.The results show that UAVs with limited communication distance can act jointly in the target area and can successfully cover the area without guidance from the central command unit. 展开更多
关键词 Dynamic environments multi-agent reinforcement learning dynamic area coverage
下载PDF
Multi-Agent Hierarchical Graph Attention Reinforcement Learning for Grid-Aware Energy Management
14
作者 FENG Bingyi FENG Mingxiao +2 位作者 WANG Minrui ZHOU Wengang LI Houqiang 《ZTE Communications》 2023年第3期11-21,共11页
The increasing adoption of renewable energy has posed challenges for voltage regulation in power distribution networks.Gridaware energy management,which includes the control of smart inverters and energy management sy... The increasing adoption of renewable energy has posed challenges for voltage regulation in power distribution networks.Gridaware energy management,which includes the control of smart inverters and energy management systems,is a trending way to mitigate this problem.However,existing multi-agent reinforcement learning methods for grid-aware energy management have not sufficiently considered the importance of agent cooperation and the unique characteristics of the grid,which leads to limited performance.In this study,we propose a new approach named multi-agent hierarchical graph attention reinforcement learning framework(MAHGA)to stabilize the voltage.Specifically,under the paradigm of centralized training and decentralized execution,we model the power distribution network as a novel hierarchical graph containing the agent-level topology and the bus-level topology.Then a hierarchical graph attention model is devised to capture the complex correlation between agents.Moreover,we incorporate graph contrastive learning as an auxiliary task in the reinforcement learning process to improve representation learning from graphs.Experiments on several real-world scenarios reveal that our approach achieves the best performance and can reduce the number of voltage violations remarkably. 展开更多
关键词 demand-side management graph neural networks multi-agent reinforcement learning voltage regulation
下载PDF
Knowledge transfer in multi-agent reinforcement learning with incremental number of agents 被引量:2
15
作者 LIU Wenzhang DONG Lu +1 位作者 LIU Jian SUN Changyin 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2022年第2期447-460,共14页
In this paper, the reinforcement learning method for cooperative multi-agent systems(MAS) with incremental number of agents is studied. The existing multi-agent reinforcement learning approaches deal with the MAS with... In this paper, the reinforcement learning method for cooperative multi-agent systems(MAS) with incremental number of agents is studied. The existing multi-agent reinforcement learning approaches deal with the MAS with a specific number of agents, and can learn well-performed policies. However, if there is an increasing number of agents, the previously learned in may not perform well in the current scenario. The new agents need to learn from scratch to find optimal policies with others,which may slow down the learning speed of the whole team. To solve that problem, in this paper, we propose a new algorithm to take full advantage of the historical knowledge which was learned before, and transfer it from the previous agents to the new agents. Since the previous agents have been trained well in the source environment, they are treated as teacher agents in the target environment. Correspondingly, the new agents are called student agents. To enable the student agents to learn from the teacher agents, we first modify the input nodes of the networks for teacher agents to adapt to the current environment. Then, the teacher agents take the observations of the student agents as input, and output the advised actions and values as supervising information. Finally, the student agents combine the reward from the environment and the supervising information from the teacher agents, and learn the optimal policies with modified loss functions. By taking full advantage of the knowledge of teacher agents, the search space for the student agents will be reduced significantly, which can accelerate the learning speed of the holistic system. The proposed algorithm is verified in some multi-agent simulation environments, and its efficiency has been demonstrated by the experiment results. 展开更多
关键词 knowledge transfer multi-agent reinforcement learning(marl) new agents
下载PDF
Multi-Agent Reinforcement Learning for Resource Allocation in Io T Networks with Edge Computing 被引量:9
16
作者 Xiaolan Liu Jiadong Yu +1 位作者 Zhiyong Feng Yue Gao 《China Communications》 SCIE CSCD 2020年第9期220-236,共17页
To support popular Internet of Things(IoT)applications such as virtual reality and mobile games,edge computing provides a front-end distributed computing archetype of centralized cloud computing with low latency and d... To support popular Internet of Things(IoT)applications such as virtual reality and mobile games,edge computing provides a front-end distributed computing archetype of centralized cloud computing with low latency and distributed data processing.However,it is challenging for multiple users to offload their computation tasks because they are competing for spectrum and computation as well as Radio Access Technologies(RAT)resources.In this paper,we investigate computation offloading mechanism of multiple selfish users with resource allocation in IoT edge computing networks by formulating it as a stochastic game.Each user is a learning agent observing its local network environment to learn optimal decisions on either local computing or edge computing with a goal of minimizing long term system cost by choosing its transmit power level,RAT and sub-channel without knowing any information of the other users.Since users’decisions are coupling at the gateway,we define the reward function of each user by considering the aggregated effect of other users.Therefore,a multi-agent reinforcement learning framework is developed to solve the game with the proposed Independent Learners based Multi-Agent Q-learning(IL-based MA-Q)algorithm.Simulations demonstrate that the proposed IL-based MA-Q algorithm is feasible to solve the formulated problem and is more energy efficient without extra cost on channel estimation at the centralized gateway.Finally,compared with the other three benchmark algorithms,it has better system cost performance and achieves distributed computation offloading. 展开更多
关键词 edge computing multi-agent reinforcement learning internet of things
下载PDF
A single-task and multi-decision evolutionary game model based on multi-agent reinforcement learning 被引量:3
17
作者 MA Ye CHANG Tianqing FAN Wenhui 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2021年第3期642-657,共16页
In the evolutionary game of the same task for groups,the changes in game rules,personal interests,the crowd size,and external supervision cause uncertain effects on individual decision-making and game results.In the M... In the evolutionary game of the same task for groups,the changes in game rules,personal interests,the crowd size,and external supervision cause uncertain effects on individual decision-making and game results.In the Markov decision framework,a single-task multi-decision evolutionary game model based on multi-agent reinforcement learning is proposed to explore the evolutionary rules in the process of a game.The model can improve the result of a evolutionary game and facilitate the completion of the task.First,based on the multi-agent theory,to solve the existing problems in the original model,a negative feedback tax penalty mechanism is proposed to guide the strategy selection of individuals in the group.In addition,in order to evaluate the evolutionary game results of the group in the model,a calculation method of the group intelligence level is defined.Secondly,the Q-learning algorithm is used to improve the guiding effect of the negative feedback tax penalty mechanism.In the model,the selection strategy of the Q-learning algorithm is improved and a bounded rationality evolutionary game strategy is proposed based on the rule of evolutionary games and the consideration of the bounded rationality of individuals.Finally,simulation results show that the proposed model can effectively guide individuals to choose cooperation strategies which are beneficial to task completion and stability under different negative feedback factor values and different group sizes,so as to improve the group intelligence level. 展开更多
关键词 multi-agent reinforcement learning evolutionary game Q-learning
下载PDF
Multi-agent reinforcement learning for edge information sharing in vehicular networks 被引量:3
18
作者 Ruyan Wang Xue Jiang +5 位作者 Yujie Zhou Zhidu Li Dapeng Wu Tong Tang Alexander Fedotov Vladimir Badenko 《Digital Communications and Networks》 SCIE CSCD 2022年第3期267-277,共11页
To guarantee the heterogeneous delay requirements of the diverse vehicular services,it is necessary to design a full cooperative policy for both Vehicle to Infrastructure(V2I)and Vehicle to Vehicle(V2V)links.This pape... To guarantee the heterogeneous delay requirements of the diverse vehicular services,it is necessary to design a full cooperative policy for both Vehicle to Infrastructure(V2I)and Vehicle to Vehicle(V2V)links.This paper investigates the reduction of the delay in edge information sharing for V2V links while satisfying the delay requirements of the V2I links.Specifically,a mean delay minimization problem and a maximum individual delay minimization problem are formulated to improve the global network performance and ensure the fairness of a single user,respectively.A multi-agent reinforcement learning framework is designed to solve these two problems,where a new reward function is proposed to evaluate the utilities of the two optimization objectives in a unified framework.Thereafter,a proximal policy optimization approach is proposed to enable each V2V user to learn its policy using the shared global network reward.The effectiveness of the proposed approach is finally validated by comparing the obtained results with those of the other baseline approaches through extensive simulation experiments. 展开更多
关键词 Vehicular networks Edge information sharing Delay guarantee multi-agent reinforcement learning Proximal policy optimization
下载PDF
Efficient Exploration for Multi-Agent Reinforcement Learning via Transferable Successor Features 被引量:1
19
作者 Wenzhang Liu Lu Dong +1 位作者 Dan Niu Changyin Sun 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2022年第9期1673-1686,共14页
In multi-agent reinforcement learning(MARL),the behaviors of each agent can influence the learning of others,and the agents have to search in an exponentially enlarged joint-action space.Hence,it is challenging for th... In multi-agent reinforcement learning(MARL),the behaviors of each agent can influence the learning of others,and the agents have to search in an exponentially enlarged joint-action space.Hence,it is challenging for the multi-agent teams to explore in the environment.Agents may achieve suboptimal policies and fail to solve some complex tasks.To improve the exploring efficiency as well as the performance of MARL tasks,in this paper,we propose a new approach by transferring the knowledge across tasks.Differently from the traditional MARL algorithms,we first assume that the reward functions can be computed by linear combinations of a shared feature function and a set of taskspecific weights.Then,we define a set of basic MARL tasks in the source domain and pre-train them as the basic knowledge for further use.Finally,once the weights for target tasks are available,it will be easier to get a well-performed policy to explore in the target domain.Hence,the learning process of agents for target tasks is speeded up by taking full use of the basic knowledge that was learned previously.We evaluate the proposed algorithm on two challenging MARL tasks:cooperative boxpushing and non-monotonic predator-prey.The experiment results have demonstrated the improved performance compared with state-of-the-art MARL algorithms. 展开更多
关键词 Knowledge transfer multi-agent systems reinforcement learning successor features
下载PDF
A Multi-Agent System for Environmental Monitoring Using Boolean Networks and Reinforcement Learning 被引量:6
20
作者 Hanzhong Zheng Dejie Shi 《Journal of Cyber Security》 2020年第2期85-96,共12页
Distributed wireless sensor networks have been shown to be effective for environmental monitoring tasks,in which multiple sensors are deployed in a wide range of the environments to collect information or monitor a pa... Distributed wireless sensor networks have been shown to be effective for environmental monitoring tasks,in which multiple sensors are deployed in a wide range of the environments to collect information or monitor a particular event,Wireless sensor networks,consisting of a large number of interacting sensors,have been successful in a variety of applications where they are able to share information using different transmission protocols through the communication network.However,the irregular and dynamic environment requires traditional wireless sensor networks to have frequent communications to exchange the most recent information,which can easily generate high communication cost through the collaborative data collection and data transmission.High frequency communication also has high probability of failure because of long distance data transmission.In this paper,we developed a novel approach to multi-sensor environment monitoring network using the idea of distributed system.Its communication network can overcome the difficulties of high communication cost and Single Point of Failure(SPOF)through the decentralized approach,which performs in-network computation.Our approach makes use of Boolean networks that allows for a non-complex method of corroboration and retains meaningful information regarding the dynamics of the communication network.Our approach also reduces the complexity of data aggregation process and employee a reinforcement learning algorithm to predict future event inside the environment through the pattern recognition. 展开更多
关键词 multi-agent system reinforcement learning environment monitoring
下载PDF
上一页 1 2 5 下一页 到第
使用帮助 返回顶部