期刊文献+
共找到99篇文章
< 1 2 5 >
每页显示 20 50 100
Regional Multi-Agent Cooperative Reinforcement Learning for City-Level Traffic Grid Signal
1
作者 Yisha Li Ya Zhang +1 位作者 Xinde Li Changyin Sun 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2024年第9期1987-1998,共12页
This article studies the effective traffic signal control problem of multiple intersections in a city-level traffic system.A novel regional multi-agent cooperative reinforcement learning algorithm called RegionSTLight... This article studies the effective traffic signal control problem of multiple intersections in a city-level traffic system.A novel regional multi-agent cooperative reinforcement learning algorithm called RegionSTLight is proposed to improve the traffic efficiency.Firstly a regional multi-agent Q-learning framework is proposed,which can equivalently decompose the global Q value of the traffic system into the local values of several regions Based on the framework and the idea of human-machine cooperation,a dynamic zoning method is designed to divide the traffic network into several strong-coupled regions according to realtime traffic flow densities.In order to achieve better cooperation inside each region,a lightweight spatio-temporal fusion feature extraction network is designed.The experiments in synthetic real-world and city-level scenarios show that the proposed RegionS TLight converges more quickly,is more stable,and obtains better asymptotic performance compared to state-of-theart models. 展开更多
关键词 Human-machine cooperation mixed domain attention mechanism multi-agent reinforcement learning spatio-temporal feature traffic signal control
下载PDF
UAV-Assisted Dynamic Avatar Task Migration for Vehicular Metaverse Services: A Multi-Agent Deep Reinforcement Learning Approach 被引量:1
2
作者 Jiawen Kang Junlong Chen +6 位作者 Minrui Xu Zehui Xiong Yutao Jiao Luchao Han Dusit Niyato Yongju Tong Shengli Xie 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2024年第2期430-445,共16页
Avatars, as promising digital representations and service assistants of users in Metaverses, can enable drivers and passengers to immerse themselves in 3D virtual services and spaces of UAV-assisted vehicular Metavers... Avatars, as promising digital representations and service assistants of users in Metaverses, can enable drivers and passengers to immerse themselves in 3D virtual services and spaces of UAV-assisted vehicular Metaverses. However, avatar tasks include a multitude of human-to-avatar and avatar-to-avatar interactive applications, e.g., augmented reality navigation,which consumes intensive computing resources. It is inefficient and impractical for vehicles to process avatar tasks locally. Fortunately, migrating avatar tasks to the nearest roadside units(RSU)or unmanned aerial vehicles(UAV) for execution is a promising solution to decrease computation overhead and reduce task processing latency, while the high mobility of vehicles brings challenges for vehicles to independently perform avatar migration decisions depending on current and future vehicle status. To address these challenges, in this paper, we propose a novel avatar task migration system based on multi-agent deep reinforcement learning(MADRL) to execute immersive vehicular avatar tasks dynamically. Specifically, we first formulate the problem of avatar task migration from vehicles to RSUs/UAVs as a partially observable Markov decision process that can be solved by MADRL algorithms. We then design the multi-agent proximal policy optimization(MAPPO) approach as the MADRL algorithm for the avatar task migration problem. To overcome slow convergence resulting from the curse of dimensionality and non-stationary issues caused by shared parameters in MAPPO, we further propose a transformer-based MAPPO approach via sequential decision-making models for the efficient representation of relationships among agents. Finally, to motivate terrestrial or non-terrestrial edge servers(e.g., RSUs or UAVs) to share computation resources and ensure traceability of the sharing records, we apply smart contracts and blockchain technologies to achieve secure sharing management. Numerical results demonstrate that the proposed approach outperforms the MAPPO approach by around 2% and effectively reduces approximately 20% of the latency of avatar task execution in UAV-assisted vehicular Metaverses. 展开更多
关键词 AVATAR blockchain metaverses multi-agent deep reinforcement learning transformer UAVS
下载PDF
Cooperative multi-target hunting by unmanned surface vehicles based on multi-agent reinforcement learning
3
作者 Jiawei Xia Yasong Luo +3 位作者 Zhikun Liu Yalun Zhang Haoran Shi Zhong Liu 《Defence Technology(防务技术)》 SCIE EI CAS CSCD 2023年第11期80-94,共15页
To solve the problem of multi-target hunting by an unmanned surface vehicle(USV)fleet,a hunting algorithm based on multi-agent reinforcement learning is proposed.Firstly,the hunting environment and kinematic model wit... To solve the problem of multi-target hunting by an unmanned surface vehicle(USV)fleet,a hunting algorithm based on multi-agent reinforcement learning is proposed.Firstly,the hunting environment and kinematic model without boundary constraints are built,and the criteria for successful target capture are given.Then,the cooperative hunting problem of a USV fleet is modeled as a decentralized partially observable Markov decision process(Dec-POMDP),and a distributed partially observable multitarget hunting Proximal Policy Optimization(DPOMH-PPO)algorithm applicable to USVs is proposed.In addition,an observation model,a reward function and the action space applicable to multi-target hunting tasks are designed.To deal with the dynamic change of observational feature dimension input by partially observable systems,a feature embedding block is proposed.By combining the two feature compression methods of column-wise max pooling(CMP)and column-wise average-pooling(CAP),observational feature encoding is established.Finally,the centralized training and decentralized execution framework is adopted to complete the training of hunting strategy.Each USV in the fleet shares the same policy and perform actions independently.Simulation experiments have verified the effectiveness of the DPOMH-PPO algorithm in the test scenarios with different numbers of USVs.Moreover,the advantages of the proposed model are comprehensively analyzed from the aspects of algorithm performance,migration effect in task scenarios and self-organization capability after being damaged,the potential deployment and application of DPOMH-PPO in the real environment is verified. 展开更多
关键词 Unmanned surface vehicles multi-agent deep reinforcement learning cooperative hunting Feature embedding Proximal policy optimization
下载PDF
Multi-Agent Deep Reinforcement Learning for Efficient Computation Offloading in Mobile Edge Computing
4
作者 Tianzhe Jiao Xiaoyue Feng +2 位作者 Chaopeng Guo Dongqi Wang Jie Song 《Computers, Materials & Continua》 SCIE EI 2023年第9期3585-3603,共19页
Mobile-edge computing(MEC)is a promising technology for the fifth-generation(5G)and sixth-generation(6G)architectures,which provides resourceful computing capabilities for Internet of Things(IoT)devices,such as virtua... Mobile-edge computing(MEC)is a promising technology for the fifth-generation(5G)and sixth-generation(6G)architectures,which provides resourceful computing capabilities for Internet of Things(IoT)devices,such as virtual reality,mobile devices,and smart cities.In general,these IoT applications always bring higher energy consumption than traditional applications,which are usually energy-constrained.To provide persistent energy,many references have studied the offloading problem to save energy consumption.However,the dynamic environment dramatically increases the optimization difficulty of the offloading decision.In this paper,we aim to minimize the energy consumption of the entireMECsystemunder the latency constraint by fully considering the dynamic environment.UnderMarkov games,we propose amulti-agent deep reinforcement learning approach based on the bi-level actorcritic learning structure to jointly optimize the offloading decision and resource allocation,which can solve the combinatorial optimization problem using an asymmetric method and compute the Stackelberg equilibrium as a better convergence point than Nash equilibrium in terms of Pareto superiority.Our method can better adapt to a dynamic environment during the data transmission than the single-agent strategy and can effectively tackle the coordination problem in the multi-agent environment.The simulation results show that the proposed method could decrease the total computational overhead by 17.8%compared to the actor-critic-based method and reduce the total computational overhead by 31.3%,36.5%,and 44.7%compared with randomoffloading,all local execution,and all offloading execution,respectively. 展开更多
关键词 Computation offloading multi-agent deep reinforcement learning mobile-edge computing latency energy efficiency
下载PDF
UAV Frequency-based Crowdsensing Using Grouping Multi-agent Deep Reinforcement Learning
5
作者 Cui ZHANG En WANG +2 位作者 Funing YANG Yong jian YANG Nan JIANG 《计算机科学》 CSCD 北大核心 2023年第2期57-68,共12页
Mobile CrowdSensing(MCS)is a promising sensing paradigm that recruits users to cooperatively perform sensing tasks.Recently,unmanned aerial vehicles(UAVs)as the powerful sensing devices are used to replace user partic... Mobile CrowdSensing(MCS)is a promising sensing paradigm that recruits users to cooperatively perform sensing tasks.Recently,unmanned aerial vehicles(UAVs)as the powerful sensing devices are used to replace user participation and carry out some special tasks,such as epidemic monitoring and earthquakes rescue.In this paper,we focus on scheduling UAVs to sense the task Point-of-Interests(PoIs)with different frequency coverage requirements.To accomplish the sensing task,the scheduling strategy needs to consider the coverage requirement,geographic fairness and energy charging simultaneously.We consider the complex interaction among UAVs and propose a grouping multi-agent deep reinforcement learning approach(G-MADDPG)to schedule UAVs distributively.G-MADDPG groups all UAVs into some teams by a distance-based clustering algorithm(DCA),then it regards each team as an agent.In this way,G-MADDPG solves the problem that the training time of traditional MADDPG is too long to converge when the number of UAVs is large,and the trade-off between training time and result accuracy could be controlled flexibly by adjusting the number of teams.Extensive simulation results show that our scheduling strategy has better performance compared with three baselines and is flexible in balancing training time and result accuracy. 展开更多
关键词 UAV Crowdsensing Frequency coverage Grouping multi-agent deep reinforcement learning
下载PDF
Cooperative Multi-Agent Reinforcement Learning with Constraint-Reduced DCOP
6
作者 Yi Xie Zhongyi Liu +1 位作者 Zhao Liu Yijun Gu 《Journal of Beijing Institute of Technology》 EI CAS 2017年第4期525-533,共9页
Cooperative multi-agent reinforcement learning( MARL) is an important topic in the field of artificial intelligence,in which distributed constraint optimization( DCOP) algorithms have been widely used to coordinat... Cooperative multi-agent reinforcement learning( MARL) is an important topic in the field of artificial intelligence,in which distributed constraint optimization( DCOP) algorithms have been widely used to coordinate the actions of multiple agents. However,dense communication among agents affects the practicability of DCOP algorithms. In this paper,we propose a novel DCOP algorithm dealing with the previous DCOP algorithms' communication problem by reducing constraints.The contributions of this paper are primarily threefold:(1) It is proved that removing constraints can effectively reduce the communication burden of DCOP algorithms.(2) An criterion is provided to identify insignificant constraints whose elimination doesn't have a great impact on the performance of the whole system.(3) A constraint-reduced DCOP algorithm is proposed by adopting a variant of spectral clustering algorithm to detect and eliminate the insignificant constraints. Our algorithm reduces the communication burdern of the benchmark DCOP algorithm while keeping its overall performance unaffected. The performance of constraint-reduced DCOP algorithm is evaluated on four configurations of cooperative sensor networks. The effectiveness of communication reduction is also verified by comparisons between the constraint-reduced DCOP and the benchmark DCOP. 展开更多
关键词 reinforcement learning cooperative multi-agent system distributed constraint optimization (DCOP) constraint-reduced DCOP
下载PDF
Deep reinforcement learning based multi-level dynamic reconfiguration for urban distribution network:a cloud-edge collaboration architecture
7
作者 Siyuan Jiang Hongjun Gao +2 位作者 Xiaohui Wang Junyong Liu Kunyu Zuo 《Global Energy Interconnection》 EI CAS CSCD 2023年第1期1-14,共14页
With the construction of the power Internet of Things(IoT),communication between smart devices in urban distribution networks has been gradually moving towards high speed,high compatibility,and low latency,which provi... With the construction of the power Internet of Things(IoT),communication between smart devices in urban distribution networks has been gradually moving towards high speed,high compatibility,and low latency,which provides reliable support for reconfiguration optimization in urban distribution networks.Thus,this study proposed a deep reinforcement learning based multi-level dynamic reconfiguration method for urban distribution networks in a cloud-edge collaboration architecture to obtain a real-time optimal multi-level dynamic reconfiguration solution.First,the multi-level dynamic reconfiguration method was discussed,which included feeder-,transformer-,and substation-levels.Subsequently,the multi-agent system was combined with the cloud-edge collaboration architecture to build a deep reinforcement learning model for multi-level dynamic reconfiguration in an urban distribution network.The cloud-edge collaboration architecture can effectively support the multi-agent system to conduct“centralized training and decentralized execution”operation modes and improve the learning efficiency of the model.Thereafter,for a multi-agent system,this study adopted a combination of offline and online learning to endow the model with the ability to realize automatic optimization and updation of the strategy.In the offline learning phase,a Q-learning-based multi-agent conservative Q-learning(MACQL)algorithm was proposed to stabilize the learning results and reduce the risk of the next online learning phase.In the online learning phase,a multi-agent deep deterministic policy gradient(MADDPG)algorithm based on policy gradients was proposed to explore the action space and update the experience pool.Finally,the effectiveness of the proposed method was verified through a simulation analysis of a real-world 445-node system. 展开更多
关键词 Cloud-edge collaboration architecture multi-agent deep reinforcement learning Multi-level dynamic reconfiguration Offline learning Online learning
下载PDF
Applications and Challenges of Deep Reinforcement Learning in Multi-robot Path Planning 被引量:1
8
作者 Tianyun Qiu Yaxuan Cheng 《Journal of Electronic Research and Application》 2021年第6期25-29,共5页
With the rapid advancement of deep reinforcement learning(DRL)in multi-agent systems,a variety of practical application challenges and solutions in the direction of multi-agent deep reinforcement learning(MADRL)are su... With the rapid advancement of deep reinforcement learning(DRL)in multi-agent systems,a variety of practical application challenges and solutions in the direction of multi-agent deep reinforcement learning(MADRL)are surfacing.Path planning in a collision-free environment is essential for many robots to do tasks quickly and efficiently,and path planning for multiple robots using deep reinforcement learning is a new research area in the field of robotics and artificial intelligence.In this paper,we sort out the training methods for multi-robot path planning,as well as summarize the practical applications in the field of DRL-based multi-robot path planning based on the methods;finally,we suggest possible research directions for researchers. 展开更多
关键词 MADRL deep reinforcement learning multi-agent system MULTI-ROBOT Path planning
下载PDF
Deep Reinforcement Learning for Addressing Disruptions in Traffic Light Control
9
作者 Faizan Rasheed Kok-Lim Alvin Yau +1 位作者 Rafidah Md Noor Yung-Wey Chong 《Computers, Materials & Continua》 SCIE EI 2022年第5期2225-2247,共23页
This paper investigates the use of multi-agent deep Q-network(MADQN)to address the curse of dimensionality issue occurred in the traditional multi-agent reinforcement learning(MARL)approach.The proposed MADQN is appli... This paper investigates the use of multi-agent deep Q-network(MADQN)to address the curse of dimensionality issue occurred in the traditional multi-agent reinforcement learning(MARL)approach.The proposed MADQN is applied to traffic light controllers at multiple intersections with busy traffic and traffic disruptions,particularly rainfall.MADQN is based on deep Q-network(DQN),which is an integration of the traditional reinforcement learning(RL)and the newly emerging deep learning(DL)approaches.MADQN enables traffic light controllers to learn,exchange knowledge with neighboring agents,and select optimal joint actions in a collaborative manner.A case study based on a real traffic network is conducted as part of a sustainable urban city project in the Sunway City of Kuala Lumpur in Malaysia.Investigation is also performed using a grid traffic network(GTN)to understand that the proposed scheme is effective in a traditional traffic network.Our proposed scheme is evaluated using two simulation tools,namely Matlab and Simulation of Urban Mobility(SUMO).Our proposed scheme has shown that the cumulative delay of vehicles can be reduced by up to 30%in the simulations. 展开更多
关键词 Artificial intelligence traffic light control traffic disruptions multi-agent deep Q-network deep reinforcement learning
下载PDF
Learning-based user association and dynamic resource allocation in multi-connectivity enabled unmanned aerial vehicle networks
10
作者 Zhipeng Cheng Minghui Liwang +3 位作者 Ning Chen Lianfen Huang Nadra Guizani Xiaojiang Du 《Digital Communications and Networks》 SCIE CSCD 2024年第1期53-62,共10页
Unmanned Aerial Vehicles(UAvs)as aerial base stations to provide communication services for ground users is a flexible and cost-effective paradigm in B5G.Besides,dynamic resource allocation and multi-connectivity can ... Unmanned Aerial Vehicles(UAvs)as aerial base stations to provide communication services for ground users is a flexible and cost-effective paradigm in B5G.Besides,dynamic resource allocation and multi-connectivity can be adopted to further harness the potentials of UAVs in improving communication capacity,in such situations such that the interference among users becomes a pivotal disincentive requiring effective solutions.To this end,we investigate the Joint UAV-User Association,Channel Allocation,and transmission Power Control(J-UACAPC)problem in a multi-connectivity-enabled UAV network with constrained backhaul links,where each UAV can determine the reusable channels and transmission power to serve the selected ground users.The goal was to mitigate co-channel interference while maximizing long-term system utility.The problem was modeled as a cooperative stochastic game with hybrid discrete-continuous action space.A Multi-Agent Hybrid Deep Reinforcement Learning(MAHDRL)algorithm was proposed to address this problem.Extensive simulation results demonstrated the effectiveness of the proposed algorithm and showed that it has a higher system utility than the baseline methods. 展开更多
关键词 UAV-user association Multi-connectivity Resource allocation Power control multi-agent deep reinforcement learning
下载PDF
Targeted multi-agent communication algorithm based on state control
11
作者 Li-yang Zhao Tian-qing Chang +3 位作者 Lei Zhang Jie Zhang Kai-xuan Chu De-peng Kong 《Defence Technology(防务技术)》 SCIE EI CAS CSCD 2024年第1期544-556,共13页
As an important mechanism in multi-agent interaction,communication can make agents form complex team relationships rather than constitute a simple set of multiple independent agents.However,the existing communication ... As an important mechanism in multi-agent interaction,communication can make agents form complex team relationships rather than constitute a simple set of multiple independent agents.However,the existing communication schemes can bring much timing redundancy and irrelevant messages,which seriously affects their practical application.To solve this problem,this paper proposes a targeted multiagent communication algorithm based on state control(SCTC).The SCTC uses a gating mechanism based on state control to reduce the timing redundancy of communication between agents and determines the interaction relationship between agents and the importance weight of a communication message through a series connection of hard-and self-attention mechanisms,realizing targeted communication message processing.In addition,by minimizing the difference between the fusion message generated from a real communication message of each agent and a fusion message generated from the buffered message,the correctness of the final action choice of the agent is ensured.Our evaluation using a challenging set of Star Craft II benchmarks indicates that the SCTC can significantly improve the learning performance and reduce the communication overhead between agents,thus ensuring better cooperation between agents. 展开更多
关键词 multi-agent deep reinforcement learning State control Targeted interaction Communication mechanism
下载PDF
Reward Function Design Method for Long Episode Pursuit Tasks Under Polar Coordinate in Multi-Agent Reinforcement Learning
12
作者 DONG Yubo CUI Tao +3 位作者 ZHOU Yufan SONG Xun ZHU Yue DONG Peng 《Journal of Shanghai Jiaotong university(Science)》 EI 2024年第4期646-655,共10页
Multi-agent reinforcement learning has recently been applied to solve pursuit problems.However,it suffers from a large number of time steps per training episode,thus always struggling to converge effectively,resulting... Multi-agent reinforcement learning has recently been applied to solve pursuit problems.However,it suffers from a large number of time steps per training episode,thus always struggling to converge effectively,resulting in low rewards and an inability for agents to learn strategies.This paper proposes a deep reinforcement learning(DRL)training method that employs an ensemble segmented multi-reward function design approach to address the convergence problem mentioned before.The ensemble reward function combines the advantages of two reward functions,which enhances the training effect of agents in long episode.Then,we eliminate the non-monotonic behavior in reward function introduced by the trigonometric functions in the traditional 2D polar coordinates observation representation.Experimental results demonstrate that this method outperforms the traditional single reward function mechanism in the pursuit scenario by enhancing agents’policy scores of the task.These ideas offer a solution to the convergence challenges faced by DRL models in long episode pursuit problems,leading to an improved model training performance. 展开更多
关键词 multi-agent reinforcement learning deep reinforcement learning(DRL) long episode reward function
原文传递
MAQMC:Multi-Agent Deep Q-Network for Multi-Zone Residential HVAC Control
13
作者 Zhengkai Ding Qiming Fu +4 位作者 Jianping Chen You Lu Hongjie Wu Nengwei Fang Bin Xing 《Computer Modeling in Engineering & Sciences》 SCIE EI 2023年第9期2759-2785,共27页
The optimization of multi-zone residential heating,ventilation,and air conditioning(HVAC)control is not an easy task due to its complex dynamic thermal model and the uncertainty of occupant-driven cooling loads.Deep r... The optimization of multi-zone residential heating,ventilation,and air conditioning(HVAC)control is not an easy task due to its complex dynamic thermal model and the uncertainty of occupant-driven cooling loads.Deep reinforcement learning(DRL)methods have recently been proposed to address the HVAC control problem.However,the application of single-agent DRL formulti-zone residential HVAC controlmay lead to non-convergence or slow convergence.In this paper,we propose MAQMC(Multi-Agent deep Q-network for multi-zone residential HVAC Control)to address this challenge with the goal of minimizing energy consumption while maintaining occupants’thermal comfort.MAQMC is divided into MAQMC2(MAQMC with two agents:one agent controls the temperature of each zone,and the other agent controls the humidity of each zone)and MAQMC3(MAQMC with three agents:three agents control the temperature and humidity of three zones,respectively).The experimental results showthatMAQMC3 can reduce energy consumption by 6.27%andMAQMC2 by 3.73%compared with the fixed point;compared with the rule-based,MAQMC3 andMAQMC2 respectively can reduce 61.89%and 59.07%comfort violation.In addition,experiments with different regional weather data demonstrate that the well-trained MAQMC RL agents have the robustness and adaptability to unknown environments. 展开更多
关键词 deep reinforcement learning multi-zone residential HVAC multi-agent energy conservation comfort
下载PDF
Multi-Robot Task Allocation Using Multimodal Multi-Objective Evolutionary Algorithm Based on Deep Reinforcement Learning
14
作者 苗镇华 黄文焘 +1 位作者 张依恋 范勤勤 《Journal of Shanghai Jiaotong university(Science)》 EI 2024年第3期377-387,共11页
The overall performance of multi-robot collaborative systems is significantly affected by the multi-robot task allocation.To improve the effectiveness,robustness,and safety of multi-robot collaborative systems,a multi... The overall performance of multi-robot collaborative systems is significantly affected by the multi-robot task allocation.To improve the effectiveness,robustness,and safety of multi-robot collaborative systems,a multimodal multi-objective evolutionary algorithm based on deep reinforcement learning is proposed in this paper.The improved multimodal multi-objective evolutionary algorithm is used to solve multi-robot task allo-cation problems.Moreover,a deep reinforcement learning strategy is used in the last generation to provide a high-quality path for each assigned robot via an end-to-end manner.Comparisons with three popular multimodal multi-objective evolutionary algorithms on three different scenarios of multi-robot task allocation problems are carried out to verify the performance of the proposed algorithm.The experimental test results show that the proposed algorithm can generate sufficient equivalent schemes to improve the availability and robustness of multi-robot collaborative systems in uncertain environments,and also produce the best scheme to improve the overall task execution efficiency of multi-robot collaborative systems. 展开更多
关键词 multi-robot task allocation multi-robot cooperation path planning multimodal multi-objective evo-lutionary algorithm deep reinforcement learning
原文传递
Optimal Secondary Control of Islanded AC Microgrids with Communication Time-delay Based on Multi-agent Deep Reinforcement Learning
15
作者 Yang Xia Yan Xu +3 位作者 Yu Wang Suman Mondal Souvik Dasgupta Amit.K.Gupta 《CSEE Journal of Power and Energy Systems》 SCIE EI CSCD 2023年第4期1301-1311,共11页
In this paper,an optimal secondary control strategy is proposed for islanded AC microgrids considering communi-cation time-delays.The proposed method is designed based on the data-driven principle,which consists of an... In this paper,an optimal secondary control strategy is proposed for islanded AC microgrids considering communi-cation time-delays.The proposed method is designed based on the data-driven principle,which consists of an offine training phase and online application phase.For offline training,each control agent is formulated by a deep neural network(DNN)and trained based on a multi-agent deep reinforcement learning(MA-DRL)framework.A deep deterministic policy gradient(DDPG)algorithm is improved and applied to search for an optimal policy of the secondary control,where a global cost function is developed to evaluate the overall control performance.In addition,the communication time-delay is introduced in the system to enrich training scenarios,which aims to solve the time-delay problem in the secondary control.For the online stage,each controller is deployed in a distributed way which only requires local and neighboring information for each DG.Based on this,the well-trained controllers can provide optimal solutions under load variations,and communication time-delays for online applications.Several case studies are conducted to validate the feasibility and stability of the proposed secondary control.Index Terms-Communication time-delay,global cost function,islanded AC microgrid,multi-agent deep reinforcement learning(MA-DRL),secondary control. 展开更多
关键词 Communication time-delay global cost function islanded AC microgrid multi-agent deep reinforcement learning(MA-DRL) secondary control.
原文传递
A multi-agent deep reinforcement learning approach for solving the multi-depot vehicle routing problem
16
作者 Ali Arishi Krishna Krishnan 《Journal of Management Analytics》 EI 2023年第3期493-515,共23页
The multi-depot vehicle routing problem(MDVRP)is one of the most essential and useful variants of the traditional vehicle routing problem(VRP)in supply chain management(SCM)and logistics studies.Many supply chains(SC)... The multi-depot vehicle routing problem(MDVRP)is one of the most essential and useful variants of the traditional vehicle routing problem(VRP)in supply chain management(SCM)and logistics studies.Many supply chains(SC)choose the joint distribution of multiple depots to cut transportation costs and delivery times.However,the ability to deliver quality and fast solutions for MDVRP remains a challenging task.Traditional optimization approaches in operation research(OR)may not be practical to solve MDVRP in real-time.With the latest developments in artificial intelligence(AI),it becomes feasible to apply deep reinforcement learning(DRL)for solving combinatorial routing problems.This paper proposes a new multi-agent deep reinforcement learning(MADRL)model to solve MDVRP.Extensive experiments are conducted to evaluate the performance of the proposed approach.Results show that the developed MADRL model can rapidly capture relative information embedded in graphs and effectively produce quality solutions in real-time. 展开更多
关键词 artificial intelligence supply chain management combinatorial optimization multi-depot vehicle routing problem multi-agent deep reinforcement learning
原文传递
Automatic depth matching method of well log based on deep reinforcement learning
17
作者 XIONG Wenjun XIAO Lizhi +1 位作者 YUAN Jiangru YUE Wenzheng 《Petroleum Exploration and Development》 SCIE 2024年第3期634-646,共13页
In the traditional well log depth matching tasks,manual adjustments are required,which means significantly labor-intensive for multiple wells,leading to low work efficiency.This paper introduces a multi-agent deep rei... In the traditional well log depth matching tasks,manual adjustments are required,which means significantly labor-intensive for multiple wells,leading to low work efficiency.This paper introduces a multi-agent deep reinforcement learning(MARL)method to automate the depth matching of multi-well logs.This method defines multiple top-down dual sliding windows based on the convolutional neural network(CNN)to extract and capture similar feature sequences on well logs,and it establishes an interaction mechanism between agents and the environment to control the depth matching process.Specifically,the agent selects an action to translate or scale the feature sequence based on the double deep Q-network(DDQN).Through the feedback of the reward signal,it evaluates the effectiveness of each action,aiming to obtain the optimal strategy and improve the accuracy of the matching task.Our experiments show that MARL can automatically perform depth matches for well-logs in multiple wells,and reduce manual intervention.In the application to the oil field,a comparative analysis of dynamic time warping(DTW),deep Q-learning network(DQN),and DDQN methods revealed that the DDQN algorithm,with its dual-network evaluation mechanism,significantly improves performance by identifying and aligning more details in the well log feature sequences,thus achieving higher depth matching accuracy. 展开更多
关键词 artificial intelligence machine learning depth matching well log multi-agent deep reinforcement learning convolutional neural network double deep Q-network
下载PDF
Improving multi-target cooperative tracking guidance for UAV swarms using multi-agent reinforcement learning 被引量:7
18
作者 Wenhong ZHOU Jie LI +1 位作者 Zhihong LIU Lincheng SHEN 《Chinese Journal of Aeronautics》 SCIE EI CAS CSCD 2022年第7期100-112,共13页
Multi-Target Tracking Guidance(MTTG)in unknown environments has great potential values in applications for Unmanned Aerial Vehicle(UAV)swarms.Although Multi-Agent Deep Reinforcement Learning(MADRL)is a promising techn... Multi-Target Tracking Guidance(MTTG)in unknown environments has great potential values in applications for Unmanned Aerial Vehicle(UAV)swarms.Although Multi-Agent Deep Reinforcement Learning(MADRL)is a promising technique for learning cooperation,most of the existing methods cannot scale well to decentralized UAV swarms due to their computational complexity or global information requirement.This paper proposes a decentralized MADRL method using the maximum reciprocal reward to learn cooperative tracking policies for UAV swarms.This method reshapes each UAV’s reward with a regularization term that is defined as the dot product of the reward vector of all neighbor UAVs and the corresponding dependency vector between the UAV and the neighbors.And the dependence between UAVs can be directly captured by the Pointwise Mutual Information(PMI)neural network without complicated aggregation statistics.Then,the experience sharing Reciprocal Reward Multi-Agent Actor-Critic(MAAC-R)algorithm is proposed to learn the cooperative sharing policy for all homogeneous UAVs.Experiments demonstrate that the proposed algorithm can improve the UAVs’cooperation more effectively than the baseline algorithms,and can stimulate a rich form of cooperative tracking behaviors of UAV swarms.Besides,the learned policy can better scale to other scenarios with more UAVs and targets. 展开更多
关键词 Decentralized cooperation Maximum reciprocal reward multi-agent actor-critic Pointwise mutual information reinforcement learning
原文传递
Learning-Based Joint Service Caching and Load Balancing for MEC Blockchain Networks
19
作者 Wenqian Zhang Wenya Fan +1 位作者 Guanglin Zhang Shiwen Mao 《China Communications》 SCIE CSCD 2023年第1期125-139,共15页
Integrating the blockchain technology into mobile-edge computing(MEC)networks with multiple cooperative MEC servers(MECS)providing a promising solution to improving resource utilization,and helping establish a secure ... Integrating the blockchain technology into mobile-edge computing(MEC)networks with multiple cooperative MEC servers(MECS)providing a promising solution to improving resource utilization,and helping establish a secure reward mechanism that can facilitate load balancing among MECS.In addition,intelligent management of service caching and load balancing can improve the network utility in MEC blockchain networks with multiple types of workloads.In this paper,we investigate a learningbased joint service caching and load balancing policy for optimizing the communication and computation resources allocation,so as to improve the resource utilization of MEC blockchain networks.We formulate the problem as a challenging long-term network revenue maximization Markov decision process(MDP)problem.To address the highly dynamic and high dimension of system states,we design a joint service caching and load balancing algorithm based on the double-dueling Deep Q network(DQN)approach.The simulation results validate the feasibility and superior performance of our proposed algorithm over several baseline schemes. 展开更多
关键词 cooperative mobile-edge computing blockchain workload offloading service caching load balancing deep reinforcement learning(DRL)
下载PDF
Locally generalised multi-agent reinforcement learning for demand and capacity balancing with customised neural networks
20
作者 Yutong CHEN Minghua HU +1 位作者 Yan XU Lei YANG 《Chinese Journal of Aeronautics》 SCIE EI CAS CSCD 2023年第4期338-353,共16页
Reinforcement Learning(RL)techniques are being studied to solve the Demand and Capacity Balancing(DCB)problems to fully exploit their computational performance.A locally gen-eralised Multi-Agent Reinforcement Learning... Reinforcement Learning(RL)techniques are being studied to solve the Demand and Capacity Balancing(DCB)problems to fully exploit their computational performance.A locally gen-eralised Multi-Agent Reinforcement Learning(MARL)for real-world DCB problems is proposed.The proposed method can deploy trained agents directly to unseen scenarios in a specific Air Traffic Flow Management(ATFM)region to quickly obtain a satisfactory solution.In this method,agents of all flights in a scenario form a multi-agent decision-making system based on partial observation.The trained agent with the customised neural network can be deployed directly on the corresponding flight,allowing it to solve the DCB problem jointly.A cooperation coefficient is introduced in the reward function,which is used to adjust the agent’s cooperation preference in a multi-agent system,thereby controlling the distribution of flight delay time allocation.A multi-iteration mechanism is designed for the DCB decision-making framework to deal with problems arising from non-stationarity in MARL and to ensure that all hotspots are eliminated.Experiments based on large-scale high-complexity real-world scenarios are conducted to verify the effectiveness and efficiency of the method.From a statis-tical point of view,it is proven that the proposed method is generalised within the scope of the flights and sectors of interest,and its optimisation performance outperforms the standard computer-assisted slot allocation and state-of-the-art RL-based DCB methods.The sensitivity analysis preliminarily reveals the effect of the cooperation coefficient on delay time allocation. 展开更多
关键词 Air traffic flow management Demand and capacity bal-ancing deep Q-learning network Flight delays GENERALISATION Ground delay program multi-agent reinforcement learning
原文传递
上一页 1 2 5 下一页 到第
使用帮助 返回顶部