期刊文献+
共找到37篇文章
< 1 2 >
每页显示 20 50 100
UAV-Assisted Dynamic Avatar Task Migration for Vehicular Metaverse Services: A Multi-Agent Deep Reinforcement Learning Approach 被引量:1
1
作者 Jiawen Kang Junlong Chen +6 位作者 Minrui Xu Zehui Xiong Yutao Jiao Luchao Han Dusit Niyato Yongju Tong Shengli Xie 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2024年第2期430-445,共16页
Avatars, as promising digital representations and service assistants of users in Metaverses, can enable drivers and passengers to immerse themselves in 3D virtual services and spaces of UAV-assisted vehicular Metavers... Avatars, as promising digital representations and service assistants of users in Metaverses, can enable drivers and passengers to immerse themselves in 3D virtual services and spaces of UAV-assisted vehicular Metaverses. However, avatar tasks include a multitude of human-to-avatar and avatar-to-avatar interactive applications, e.g., augmented reality navigation,which consumes intensive computing resources. It is inefficient and impractical for vehicles to process avatar tasks locally. Fortunately, migrating avatar tasks to the nearest roadside units(RSU)or unmanned aerial vehicles(UAV) for execution is a promising solution to decrease computation overhead and reduce task processing latency, while the high mobility of vehicles brings challenges for vehicles to independently perform avatar migration decisions depending on current and future vehicle status. To address these challenges, in this paper, we propose a novel avatar task migration system based on multi-agent deep reinforcement learning(MADRL) to execute immersive vehicular avatar tasks dynamically. Specifically, we first formulate the problem of avatar task migration from vehicles to RSUs/UAVs as a partially observable Markov decision process that can be solved by MADRL algorithms. We then design the multi-agent proximal policy optimization(MAPPO) approach as the MADRL algorithm for the avatar task migration problem. To overcome slow convergence resulting from the curse of dimensionality and non-stationary issues caused by shared parameters in MAPPO, we further propose a transformer-based MAPPO approach via sequential decision-making models for the efficient representation of relationships among agents. Finally, to motivate terrestrial or non-terrestrial edge servers(e.g., RSUs or UAVs) to share computation resources and ensure traceability of the sharing records, we apply smart contracts and blockchain technologies to achieve secure sharing management. Numerical results demonstrate that the proposed approach outperforms the MAPPO approach by around 2% and effectively reduces approximately 20% of the latency of avatar task execution in UAV-assisted vehicular Metaverses. 展开更多
关键词 AVATAR blockchain metaverses multi-agent deep reinforcement learning transformer UAVS
下载PDF
Automatic depth matching method of well log based on deep reinforcement learning
2
作者 XIONG Wenjun XIAO Lizhi +1 位作者 YUAN Jiangru YUE Wenzheng 《Petroleum Exploration and Development》 SCIE 2024年第3期634-646,共13页
In the traditional well log depth matching tasks,manual adjustments are required,which means significantly labor-intensive for multiple wells,leading to low work efficiency.This paper introduces a multi-agent deep rei... In the traditional well log depth matching tasks,manual adjustments are required,which means significantly labor-intensive for multiple wells,leading to low work efficiency.This paper introduces a multi-agent deep reinforcement learning(MARL)method to automate the depth matching of multi-well logs.This method defines multiple top-down dual sliding windows based on the convolutional neural network(CNN)to extract and capture similar feature sequences on well logs,and it establishes an interaction mechanism between agents and the environment to control the depth matching process.Specifically,the agent selects an action to translate or scale the feature sequence based on the double deep Q-network(DDQN).Through the feedback of the reward signal,it evaluates the effectiveness of each action,aiming to obtain the optimal strategy and improve the accuracy of the matching task.Our experiments show that MARL can automatically perform depth matches for well-logs in multiple wells,and reduce manual intervention.In the application to the oil field,a comparative analysis of dynamic time warping(DTW),deep Q-learning network(DQN),and DDQN methods revealed that the DDQN algorithm,with its dual-network evaluation mechanism,significantly improves performance by identifying and aligning more details in the well log feature sequences,thus achieving higher depth matching accuracy. 展开更多
关键词 artificial intelligence machine learning depth matching well log multi-agent deep reinforcement learning convolutional neural network double deep Q-network
下载PDF
Multi-Agent Deep Reinforcement Learning for Efficient Computation Offloading in Mobile Edge Computing
3
作者 Tianzhe Jiao Xiaoyue Feng +2 位作者 Chaopeng Guo Dongqi Wang Jie Song 《Computers, Materials & Continua》 SCIE EI 2023年第9期3585-3603,共19页
Mobile-edge computing(MEC)is a promising technology for the fifth-generation(5G)and sixth-generation(6G)architectures,which provides resourceful computing capabilities for Internet of Things(IoT)devices,such as virtua... Mobile-edge computing(MEC)is a promising technology for the fifth-generation(5G)and sixth-generation(6G)architectures,which provides resourceful computing capabilities for Internet of Things(IoT)devices,such as virtual reality,mobile devices,and smart cities.In general,these IoT applications always bring higher energy consumption than traditional applications,which are usually energy-constrained.To provide persistent energy,many references have studied the offloading problem to save energy consumption.However,the dynamic environment dramatically increases the optimization difficulty of the offloading decision.In this paper,we aim to minimize the energy consumption of the entireMECsystemunder the latency constraint by fully considering the dynamic environment.UnderMarkov games,we propose amulti-agent deep reinforcement learning approach based on the bi-level actorcritic learning structure to jointly optimize the offloading decision and resource allocation,which can solve the combinatorial optimization problem using an asymmetric method and compute the Stackelberg equilibrium as a better convergence point than Nash equilibrium in terms of Pareto superiority.Our method can better adapt to a dynamic environment during the data transmission than the single-agent strategy and can effectively tackle the coordination problem in the multi-agent environment.The simulation results show that the proposed method could decrease the total computational overhead by 17.8%compared to the actor-critic-based method and reduce the total computational overhead by 31.3%,36.5%,and 44.7%compared with randomoffloading,all local execution,and all offloading execution,respectively. 展开更多
关键词 Computation offloading multi-agent deep reinforcement learning mobile-edge computing latency energy efficiency
下载PDF
UAV Frequency-based Crowdsensing Using Grouping Multi-agent Deep Reinforcement Learning
4
作者 Cui ZHANG En WANG +2 位作者 Funing YANG Yong jian YANG Nan JIANG 《计算机科学》 CSCD 北大核心 2023年第2期57-68,共12页
Mobile CrowdSensing(MCS)is a promising sensing paradigm that recruits users to cooperatively perform sensing tasks.Recently,unmanned aerial vehicles(UAVs)as the powerful sensing devices are used to replace user partic... Mobile CrowdSensing(MCS)is a promising sensing paradigm that recruits users to cooperatively perform sensing tasks.Recently,unmanned aerial vehicles(UAVs)as the powerful sensing devices are used to replace user participation and carry out some special tasks,such as epidemic monitoring and earthquakes rescue.In this paper,we focus on scheduling UAVs to sense the task Point-of-Interests(PoIs)with different frequency coverage requirements.To accomplish the sensing task,the scheduling strategy needs to consider the coverage requirement,geographic fairness and energy charging simultaneously.We consider the complex interaction among UAVs and propose a grouping multi-agent deep reinforcement learning approach(G-MADDPG)to schedule UAVs distributively.G-MADDPG groups all UAVs into some teams by a distance-based clustering algorithm(DCA),then it regards each team as an agent.In this way,G-MADDPG solves the problem that the training time of traditional MADDPG is too long to converge when the number of UAVs is large,and the trade-off between training time and result accuracy could be controlled flexibly by adjusting the number of teams.Extensive simulation results show that our scheduling strategy has better performance compared with three baselines and is flexible in balancing training time and result accuracy. 展开更多
关键词 UAV Crowdsensing Frequency coverage Grouping multi-agent deep reinforcement learning
下载PDF
Cooperative multi-target hunting by unmanned surface vehicles based on multi-agent reinforcement learning
5
作者 Jiawei Xia Yasong Luo +3 位作者 Zhikun Liu Yalun Zhang Haoran Shi Zhong Liu 《Defence Technology(防务技术)》 SCIE EI CAS CSCD 2023年第11期80-94,共15页
To solve the problem of multi-target hunting by an unmanned surface vehicle(USV)fleet,a hunting algorithm based on multi-agent reinforcement learning is proposed.Firstly,the hunting environment and kinematic model wit... To solve the problem of multi-target hunting by an unmanned surface vehicle(USV)fleet,a hunting algorithm based on multi-agent reinforcement learning is proposed.Firstly,the hunting environment and kinematic model without boundary constraints are built,and the criteria for successful target capture are given.Then,the cooperative hunting problem of a USV fleet is modeled as a decentralized partially observable Markov decision process(Dec-POMDP),and a distributed partially observable multitarget hunting Proximal Policy Optimization(DPOMH-PPO)algorithm applicable to USVs is proposed.In addition,an observation model,a reward function and the action space applicable to multi-target hunting tasks are designed.To deal with the dynamic change of observational feature dimension input by partially observable systems,a feature embedding block is proposed.By combining the two feature compression methods of column-wise max pooling(CMP)and column-wise average-pooling(CAP),observational feature encoding is established.Finally,the centralized training and decentralized execution framework is adopted to complete the training of hunting strategy.Each USV in the fleet shares the same policy and perform actions independently.Simulation experiments have verified the effectiveness of the DPOMH-PPO algorithm in the test scenarios with different numbers of USVs.Moreover,the advantages of the proposed model are comprehensively analyzed from the aspects of algorithm performance,migration effect in task scenarios and self-organization capability after being damaged,the potential deployment and application of DPOMH-PPO in the real environment is verified. 展开更多
关键词 Unmanned surface vehicles multi-agent deep reinforcement learning Cooperative hunting Feature embedding Proximal policy optimization
下载PDF
Deep reinforcement learning based multi-level dynamic reconfiguration for urban distribution network:a cloud-edge collaboration architecture 被引量:1
6
作者 Siyuan Jiang Hongjun Gao +2 位作者 Xiaohui Wang Junyong Liu Kunyu Zuo 《Global Energy Interconnection》 EI CAS CSCD 2023年第1期1-14,共14页
With the construction of the power Internet of Things(IoT),communication between smart devices in urban distribution networks has been gradually moving towards high speed,high compatibility,and low latency,which provi... With the construction of the power Internet of Things(IoT),communication between smart devices in urban distribution networks has been gradually moving towards high speed,high compatibility,and low latency,which provides reliable support for reconfiguration optimization in urban distribution networks.Thus,this study proposed a deep reinforcement learning based multi-level dynamic reconfiguration method for urban distribution networks in a cloud-edge collaboration architecture to obtain a real-time optimal multi-level dynamic reconfiguration solution.First,the multi-level dynamic reconfiguration method was discussed,which included feeder-,transformer-,and substation-levels.Subsequently,the multi-agent system was combined with the cloud-edge collaboration architecture to build a deep reinforcement learning model for multi-level dynamic reconfiguration in an urban distribution network.The cloud-edge collaboration architecture can effectively support the multi-agent system to conduct“centralized training and decentralized execution”operation modes and improve the learning efficiency of the model.Thereafter,for a multi-agent system,this study adopted a combination of offline and online learning to endow the model with the ability to realize automatic optimization and updation of the strategy.In the offline learning phase,a Q-learning-based multi-agent conservative Q-learning(MACQL)algorithm was proposed to stabilize the learning results and reduce the risk of the next online learning phase.In the online learning phase,a multi-agent deep deterministic policy gradient(MADDPG)algorithm based on policy gradients was proposed to explore the action space and update the experience pool.Finally,the effectiveness of the proposed method was verified through a simulation analysis of a real-world 445-node system. 展开更多
关键词 Cloud-edge collaboration architecture multi-agent deep reinforcement learning Multi-level dynamic reconfiguration Offline learning Online learning
下载PDF
Applications and Challenges of Deep Reinforcement Learning in Multi-robot Path Planning 被引量:1
7
作者 Tianyun Qiu Yaxuan Cheng 《Journal of Electronic Research and Application》 2021年第6期25-29,共5页
With the rapid advancement of deep reinforcement learning(DRL)in multi-agent systems,a variety of practical application challenges and solutions in the direction of multi-agent deep reinforcement learning(MADRL)are su... With the rapid advancement of deep reinforcement learning(DRL)in multi-agent systems,a variety of practical application challenges and solutions in the direction of multi-agent deep reinforcement learning(MADRL)are surfacing.Path planning in a collision-free environment is essential for many robots to do tasks quickly and efficiently,and path planning for multiple robots using deep reinforcement learning is a new research area in the field of robotics and artificial intelligence.In this paper,we sort out the training methods for multi-robot path planning,as well as summarize the practical applications in the field of DRL-based multi-robot path planning based on the methods;finally,we suggest possible research directions for researchers. 展开更多
关键词 madrl deep reinforcement learning multi-agent system MULTI-ROBOT Path planning
下载PDF
Deep Reinforcement Learning for Addressing Disruptions in Traffic Light Control
8
作者 Faizan Rasheed Kok-Lim Alvin Yau +1 位作者 Rafidah Md Noor Yung-Wey Chong 《Computers, Materials & Continua》 SCIE EI 2022年第5期2225-2247,共23页
This paper investigates the use of multi-agent deep Q-network(MADQN)to address the curse of dimensionality issue occurred in the traditional multi-agent reinforcement learning(MARL)approach.The proposed MADQN is appli... This paper investigates the use of multi-agent deep Q-network(MADQN)to address the curse of dimensionality issue occurred in the traditional multi-agent reinforcement learning(MARL)approach.The proposed MADQN is applied to traffic light controllers at multiple intersections with busy traffic and traffic disruptions,particularly rainfall.MADQN is based on deep Q-network(DQN),which is an integration of the traditional reinforcement learning(RL)and the newly emerging deep learning(DL)approaches.MADQN enables traffic light controllers to learn,exchange knowledge with neighboring agents,and select optimal joint actions in a collaborative manner.A case study based on a real traffic network is conducted as part of a sustainable urban city project in the Sunway City of Kuala Lumpur in Malaysia.Investigation is also performed using a grid traffic network(GTN)to understand that the proposed scheme is effective in a traditional traffic network.Our proposed scheme is evaluated using two simulation tools,namely Matlab and Simulation of Urban Mobility(SUMO).Our proposed scheme has shown that the cumulative delay of vehicles can be reduced by up to 30%in the simulations. 展开更多
关键词 Artificial intelligence traffic light control traffic disruptions multi-agent deep Q-network deep reinforcement learning
下载PDF
Learning-based user association and dynamic resource allocation in multi-connectivity enabled unmanned aerial vehicle networks
9
作者 Zhipeng Cheng Minghui Liwang +3 位作者 Ning Chen Lianfen Huang Nadra Guizani Xiaojiang Du 《Digital Communications and Networks》 SCIE CSCD 2024年第1期53-62,共10页
Unmanned Aerial Vehicles(UAvs)as aerial base stations to provide communication services for ground users is a flexible and cost-effective paradigm in B5G.Besides,dynamic resource allocation and multi-connectivity can ... Unmanned Aerial Vehicles(UAvs)as aerial base stations to provide communication services for ground users is a flexible and cost-effective paradigm in B5G.Besides,dynamic resource allocation and multi-connectivity can be adopted to further harness the potentials of UAVs in improving communication capacity,in such situations such that the interference among users becomes a pivotal disincentive requiring effective solutions.To this end,we investigate the Joint UAV-User Association,Channel Allocation,and transmission Power Control(J-UACAPC)problem in a multi-connectivity-enabled UAV network with constrained backhaul links,where each UAV can determine the reusable channels and transmission power to serve the selected ground users.The goal was to mitigate co-channel interference while maximizing long-term system utility.The problem was modeled as a cooperative stochastic game with hybrid discrete-continuous action space.A Multi-Agent Hybrid Deep Reinforcement Learning(MAHDRL)algorithm was proposed to address this problem.Extensive simulation results demonstrated the effectiveness of the proposed algorithm and showed that it has a higher system utility than the baseline methods. 展开更多
关键词 UAV-user association Multi-connectivity Resource allocation Power control multi-agent deep reinforcement learning
下载PDF
Targeted multi-agent communication algorithm based on state control
10
作者 Li-yang Zhao Tian-qing Chang +3 位作者 Lei Zhang Jie Zhang Kai-xuan Chu De-peng Kong 《Defence Technology(防务技术)》 SCIE EI CAS CSCD 2024年第1期544-556,共13页
As an important mechanism in multi-agent interaction,communication can make agents form complex team relationships rather than constitute a simple set of multiple independent agents.However,the existing communication ... As an important mechanism in multi-agent interaction,communication can make agents form complex team relationships rather than constitute a simple set of multiple independent agents.However,the existing communication schemes can bring much timing redundancy and irrelevant messages,which seriously affects their practical application.To solve this problem,this paper proposes a targeted multiagent communication algorithm based on state control(SCTC).The SCTC uses a gating mechanism based on state control to reduce the timing redundancy of communication between agents and determines the interaction relationship between agents and the importance weight of a communication message through a series connection of hard-and self-attention mechanisms,realizing targeted communication message processing.In addition,by minimizing the difference between the fusion message generated from a real communication message of each agent and a fusion message generated from the buffered message,the correctness of the final action choice of the agent is ensured.Our evaluation using a challenging set of Star Craft II benchmarks indicates that the SCTC can significantly improve the learning performance and reduce the communication overhead between agents,thus ensuring better cooperation between agents. 展开更多
关键词 multi-agent deep reinforcement learning State control Targeted interaction Communication mechanism
下载PDF
MADDPG-D2: An Intelligent Dynamic Task Allocation Algorithm Based on Multi-Agent Architecture Driven by Prior Knowledge
11
作者 Tengda Li Gang Wang Qiang Fu 《Computer Modeling in Engineering & Sciences》 SCIE EI 2024年第9期2559-2586,共28页
Aiming at the problems of low solution accuracy and high decision pressure when facing large-scale dynamic task allocation(DTA)and high-dimensional decision space with single agent,this paper combines the deep reinfor... Aiming at the problems of low solution accuracy and high decision pressure when facing large-scale dynamic task allocation(DTA)and high-dimensional decision space with single agent,this paper combines the deep reinforce-ment learning(DRL)theory and an improved Multi-Agent Deep Deterministic Policy Gradient(MADDPG-D2)algorithm with a dual experience replay pool and a dual noise based on multi-agent architecture is proposed to improve the efficiency of DTA.The algorithm is based on the traditional Multi-Agent Deep Deterministic Policy Gradient(MADDPG)algorithm,and considers the introduction of a double noise mechanism to increase the action exploration space in the early stage of the algorithm,and the introduction of a double experience pool to improve the data utilization rate;at the same time,in order to accelerate the training speed and efficiency of the agents,and to solve the cold-start problem of the training,the a priori knowledge technology is applied to the training of the algorithm.Finally,the MADDPG-D2 algorithm is compared and analyzed based on the digital battlefield of ground and air confrontation.The experimental results show that the agents trained by the MADDPG-D2 algorithm have higher win rates and average rewards,can utilize the resources more reasonably,and better solve the problem of the traditional single agent algorithms facing the difficulty of solving the problem in the high-dimensional decision space.The MADDPG-D2 algorithm based on multi-agent architecture proposed in this paper has certain superiority and rationality in DTA. 展开更多
关键词 deep reinforcement learning dynamic task allocation intelligent decision-making multi-agent system MADDPG-D2 algorithm
下载PDF
Reward Function Design Method for Long Episode Pursuit Tasks Under Polar Coordinate in Multi-Agent Reinforcement Learning
12
作者 DONG Yubo CUI Tao +3 位作者 ZHOU Yufan SONG Xun ZHU Yue DONG Peng 《Journal of Shanghai Jiaotong university(Science)》 EI 2024年第4期646-655,共10页
Multi-agent reinforcement learning has recently been applied to solve pursuit problems.However,it suffers from a large number of time steps per training episode,thus always struggling to converge effectively,resulting... Multi-agent reinforcement learning has recently been applied to solve pursuit problems.However,it suffers from a large number of time steps per training episode,thus always struggling to converge effectively,resulting in low rewards and an inability for agents to learn strategies.This paper proposes a deep reinforcement learning(DRL)training method that employs an ensemble segmented multi-reward function design approach to address the convergence problem mentioned before.The ensemble reward function combines the advantages of two reward functions,which enhances the training effect of agents in long episode.Then,we eliminate the non-monotonic behavior in reward function introduced by the trigonometric functions in the traditional 2D polar coordinates observation representation.Experimental results demonstrate that this method outperforms the traditional single reward function mechanism in the pursuit scenario by enhancing agents’policy scores of the task.These ideas offer a solution to the convergence challenges faced by DRL models in long episode pursuit problems,leading to an improved model training performance. 展开更多
关键词 multi-agent reinforcement learning deep reinforcement learning(DRL) long episode reward function
原文传递
MAQMC:Multi-Agent Deep Q-Network for Multi-Zone Residential HVAC Control
13
作者 Zhengkai Ding Qiming Fu +4 位作者 Jianping Chen You Lu Hongjie Wu Nengwei Fang Bin Xing 《Computer Modeling in Engineering & Sciences》 SCIE EI 2023年第9期2759-2785,共27页
The optimization of multi-zone residential heating,ventilation,and air conditioning(HVAC)control is not an easy task due to its complex dynamic thermal model and the uncertainty of occupant-driven cooling loads.Deep r... The optimization of multi-zone residential heating,ventilation,and air conditioning(HVAC)control is not an easy task due to its complex dynamic thermal model and the uncertainty of occupant-driven cooling loads.Deep reinforcement learning(DRL)methods have recently been proposed to address the HVAC control problem.However,the application of single-agent DRL formulti-zone residential HVAC controlmay lead to non-convergence or slow convergence.In this paper,we propose MAQMC(Multi-Agent deep Q-network for multi-zone residential HVAC Control)to address this challenge with the goal of minimizing energy consumption while maintaining occupants’thermal comfort.MAQMC is divided into MAQMC2(MAQMC with two agents:one agent controls the temperature of each zone,and the other agent controls the humidity of each zone)and MAQMC3(MAQMC with three agents:three agents control the temperature and humidity of three zones,respectively).The experimental results showthatMAQMC3 can reduce energy consumption by 6.27%andMAQMC2 by 3.73%compared with the fixed point;compared with the rule-based,MAQMC3 andMAQMC2 respectively can reduce 61.89%and 59.07%comfort violation.In addition,experiments with different regional weather data demonstrate that the well-trained MAQMC RL agents have the robustness and adaptability to unknown environments. 展开更多
关键词 deep reinforcement learning multi-zone residential HVAC multi-agent energy conservation comfort
下载PDF
Optimal Secondary Control of Islanded AC Microgrids with Communication Time-delay Based on Multi-agent Deep Reinforcement Learning
14
作者 Yang Xia Yan Xu +3 位作者 Yu Wang Suman Mondal Souvik Dasgupta Amit.K.Gupta 《CSEE Journal of Power and Energy Systems》 SCIE EI CSCD 2023年第4期1301-1311,共11页
In this paper,an optimal secondary control strategy is proposed for islanded AC microgrids considering communi-cation time-delays.The proposed method is designed based on the data-driven principle,which consists of an... In this paper,an optimal secondary control strategy is proposed for islanded AC microgrids considering communi-cation time-delays.The proposed method is designed based on the data-driven principle,which consists of an offine training phase and online application phase.For offline training,each control agent is formulated by a deep neural network(DNN)and trained based on a multi-agent deep reinforcement learning(MA-DRL)framework.A deep deterministic policy gradient(DDPG)algorithm is improved and applied to search for an optimal policy of the secondary control,where a global cost function is developed to evaluate the overall control performance.In addition,the communication time-delay is introduced in the system to enrich training scenarios,which aims to solve the time-delay problem in the secondary control.For the online stage,each controller is deployed in a distributed way which only requires local and neighboring information for each DG.Based on this,the well-trained controllers can provide optimal solutions under load variations,and communication time-delays for online applications.Several case studies are conducted to validate the feasibility and stability of the proposed secondary control.Index Terms-Communication time-delay,global cost function,islanded AC microgrid,multi-agent deep reinforcement learning(MA-DRL),secondary control. 展开更多
关键词 Communication time-delay global cost function islanded AC microgrid multi-agent deep reinforcement learning(MA-DRL) secondary control.
原文传递
A multi-agent deep reinforcement learning approach for solving the multi-depot vehicle routing problem
15
作者 Ali Arishi Krishna Krishnan 《Journal of Management Analytics》 EI 2023年第3期493-515,共23页
The multi-depot vehicle routing problem(MDVRP)is one of the most essential and useful variants of the traditional vehicle routing problem(VRP)in supply chain management(SCM)and logistics studies.Many supply chains(SC)... The multi-depot vehicle routing problem(MDVRP)is one of the most essential and useful variants of the traditional vehicle routing problem(VRP)in supply chain management(SCM)and logistics studies.Many supply chains(SC)choose the joint distribution of multiple depots to cut transportation costs and delivery times.However,the ability to deliver quality and fast solutions for MDVRP remains a challenging task.Traditional optimization approaches in operation research(OR)may not be practical to solve MDVRP in real-time.With the latest developments in artificial intelligence(AI),it becomes feasible to apply deep reinforcement learning(DRL)for solving combinatorial routing problems.This paper proposes a new multi-agent deep reinforcement learning(MADRL)model to solve MDVRP.Extensive experiments are conducted to evaluate the performance of the proposed approach.Results show that the developed MADRL model can rapidly capture relative information embedded in graphs and effectively produce quality solutions in real-time. 展开更多
关键词 artificial intelligence supply chain management combinatorial optimization multi-depot vehicle routing problem multi-agent deep reinforcement learning
原文传递
Locally generalised multi-agent reinforcement learning for demand and capacity balancing with customised neural networks 被引量:1
16
作者 Yutong CHEN Minghua HU +1 位作者 Yan XU Lei YANG 《Chinese Journal of Aeronautics》 SCIE EI CAS CSCD 2023年第4期338-353,共16页
Reinforcement Learning(RL)techniques are being studied to solve the Demand and Capacity Balancing(DCB)problems to fully exploit their computational performance.A locally gen-eralised Multi-Agent Reinforcement Learning... Reinforcement Learning(RL)techniques are being studied to solve the Demand and Capacity Balancing(DCB)problems to fully exploit their computational performance.A locally gen-eralised Multi-Agent Reinforcement Learning(MARL)for real-world DCB problems is proposed.The proposed method can deploy trained agents directly to unseen scenarios in a specific Air Traffic Flow Management(ATFM)region to quickly obtain a satisfactory solution.In this method,agents of all flights in a scenario form a multi-agent decision-making system based on partial observation.The trained agent with the customised neural network can be deployed directly on the corresponding flight,allowing it to solve the DCB problem jointly.A cooperation coefficient is introduced in the reward function,which is used to adjust the agent’s cooperation preference in a multi-agent system,thereby controlling the distribution of flight delay time allocation.A multi-iteration mechanism is designed for the DCB decision-making framework to deal with problems arising from non-stationarity in MARL and to ensure that all hotspots are eliminated.Experiments based on large-scale high-complexity real-world scenarios are conducted to verify the effectiveness and efficiency of the method.From a statis-tical point of view,it is proven that the proposed method is generalised within the scope of the flights and sectors of interest,and its optimisation performance outperforms the standard computer-assisted slot allocation and state-of-the-art RL-based DCB methods.The sensitivity analysis preliminarily reveals the effect of the cooperation coefficient on delay time allocation. 展开更多
关键词 Air traffic flow management Demand and capacity bal-ancing deep Q-learning network Flight delays GENERALISATION Ground delay program multi-agent reinforcement learning
原文传递
Towards a multi-agent reinforcement learning approach for joint sensing and sharing in cognitive radio networks
17
作者 Kagiso Rapetswa Ling Cheng 《Intelligent and Converged Networks》 EI 2023年第1期50-75,共26页
The adoption of the Fifth Generation(5G)and beyond 5G networks is driving the demand for learning approaches that enable users to co-exist harmoniously in a multi-user distributed environment.Although resource-constra... The adoption of the Fifth Generation(5G)and beyond 5G networks is driving the demand for learning approaches that enable users to co-exist harmoniously in a multi-user distributed environment.Although resource-constrained,the Cognitive Radio(CR)has been identified as a key enabler of distributed 5G and beyond networks due to its cognitive abilities and ability to access idle spectrum opportunistically.Reinforcement learning is well suited to meet the demand for learning in 5G and beyond 5G networks because it does not require the learning agent to have prior information about the environment in which it operates.Intuitively,CRs should be enabled to implement reinforcement learning to efficiently gain opportunistic access to spectrum and co-exist with each other.However,the application of reinforcement learning is straightforward in a single-agent environment and complex and resource intensive in a multi-agent and multi-objective learning environment.In this paper,(1)we present a brief history and overview of reinforcement learning and its limitations;(2)we provide a review of recent multi-agent learning methods proposed and multi-agent learning algorithms applied in Cognitive Radio(CR)networks;and(3)we further present a novel framework for multi-CR reinforcement learning and conclude with a synopsis of future research directions and recommendations. 展开更多
关键词 cognitive radio multi-agent reinforcement learning deep reinforcement learning mean field reinforcement learning organic computing
原文传递
Transformer in reinforcement learning for decision-making:a survey
18
作者 Weilin YUAN Jiaxing CHEN +4 位作者 Shaofei CHEN Dawei FENG Zhenzhen HU Peng LI Weiwei ZHAO 《Frontiers of Information Technology & Electronic Engineering》 SCIE EI CSCD 2024年第6期763-790,共28页
Reinforcement learning(RL)has become a dominant decision-making paradigm and has achieved notable success in many real-world applications.Notably,deep neural networks play a crucial role in unlocking RL’s potential i... Reinforcement learning(RL)has become a dominant decision-making paradigm and has achieved notable success in many real-world applications.Notably,deep neural networks play a crucial role in unlocking RL’s potential in large-scale decision-making tasks.Inspired by current major success of Transformer in natural language processing and computer vision,numerous bottlenecks have been overcome by combining Transformer with RL for decision-making.This paper presents a multiangle systematic survey of various Transformer-based RL(TransRL)models applied in decision-making tasks,including basic models,advanced algorithms,representative implementation instances,typical applications,and known challenges.Our work aims to provide insights into problems that inherently arise with the current RL approaches,and examines how we can address them with better TransRL models.To our knowledge,we are the first to present a comprehensive review of the recent Transformer research developments in RL for decision-making.We hope that this survey provides a comprehensive review of TransRL models and inspires the RL community in its pursuit of future directions.To keep track of the rapid TransRL developments in the decision-making domains,we summarize the latest papers and their open-source implementations at https://github.com/williamyuanv0/Transformer-in-Reinforcement-Learning-for-Decision-Making-A-Survey. 展开更多
关键词 TRANSFORMER reinforcement learning(RL) Decision-making(DM) deep neural network(DNN) multi-agent reinforcement learning(MARL) Meta-reinforcement learning(Meta-RL)
原文传递
Approximating Nash Equilibrium in Day-ahead Electricity Market Bidding with Multi-agent Deep Reinforcement Learning 被引量:8
19
作者 Yan Du Fangxing Li +1 位作者 Helia Zandi Yaosuo Xue 《Journal of Modern Power Systems and Clean Energy》 SCIE EI CSCD 2021年第3期534-544,共11页
In this paper,a day-ahead electricity market bidding problem with multiple strategic generation company(GEN-CO)bidders is studied.The problem is formulated as a Markov game model,where GENCO bidders interact with each... In this paper,a day-ahead electricity market bidding problem with multiple strategic generation company(GEN-CO)bidders is studied.The problem is formulated as a Markov game model,where GENCO bidders interact with each other to develop their optimal day-ahead bidding strategies.Considering unobservable information in the problem,a model-free and data-driven approach,known as multi-agent deep deterministic policy gradient(MADDPG),is applied for approximating the Nash equilibrium(NE)in the above Markov game.The MAD-DPG algorithm has the advantage of generalization due to the automatic feature extraction ability of the deep neural networks.The algorithm is tested on an IEEE 30-bus system with three competitive GENCO bidders in both an uncongested case and a congested case.Comparisons with a truthful bidding strategy and state-of-the-art deep reinforcement learning methods including deep Q network and deep deterministic policy gradient(DDPG)demonstrate that the applied MADDPG algorithm can find a superior bidding strategy for all the market participants with increased profit gains.In addition,the comparison with a conventional-model-based method shows that the MADDPG algorithm has higher computational efficiency,which is feasible for real-world applications. 展开更多
关键词 Bidding strategy day-ahead electricity market deep reinforcement learning Markov game multi-agent deterministic policy gradient(MADDPG) Nash equilibrium(NE)
原文传递
Multi-agent deep reinforcement learning for end—edge orchestrated resource allocation in industrial wireless networks 被引量:2
20
作者 Xiaoyu LIU Chi XU +1 位作者 Haibin YU Peng ZENG 《Frontiers of Information Technology & Electronic Engineering》 SCIE EI CSCD 2022年第1期47-60,共14页
Edge artificial intelligence will empower the ever simple industrial wireless networks(IWNs)supporting complex and dynamic tasks by collaboratively exploiting the computation and communication resources of both machin... Edge artificial intelligence will empower the ever simple industrial wireless networks(IWNs)supporting complex and dynamic tasks by collaboratively exploiting the computation and communication resources of both machine-type devices(MTDs)and edge servers.In this paper,we propose a multi-agent deep reinforcement learning based resource allocation(MADRL-RA)algorithm for end-edge orchestrated IWNs to support computation-intensive and delay-sensitive applications.First,we present the system model of IWNs,wherein each MTD is regarded as a self-learning agent.Then,we apply the Markov decision process to formulate a minimum system overhead problem with joint optimization of delay and energy consumption.Next,we employ MADRL to defeat the explosive state space and learn an effective resource allocation policy with respect to computing decision,computation capacity,and transmission power.To break the time correlation of training data while accelerating the learning process of MADRL-RA,we design a weighted experience replay to store and sample experiences categorically.Furthermore,we propose a step-by-stepε-greedy method to balance exploitation and exploration.Finally,we verify the effectiveness of MADRL-RA by comparing it with some benchmark algorithms in many experiments,showing that MADRL-RA converges quickly and learns an effective resource allocation policy achieving the minimum system overhead. 展开更多
关键词 multi-agent deep reinforcement learning End-edge orchestrated Industrial wireless networks Delay Energy consumption
原文传递
上一页 1 2 下一页 到第
使用帮助 返回顶部