期刊文献+
共找到3篇文章
< 1 >
每页显示 20 50 100
基于DDPG算法的路径规划研究 被引量:1
1
作者 张义 郭坤 《电脑知识与技术》 2021年第4期193-194,200,共3页
路径规划是人工智能领域的一个经典问题,在国防军事、道路交通、机器人仿真等诸多领域有着广泛应用,然而现有的路径规划算法大多存在着环境单一、离散的动作空间、需要人工构筑模型的问题。强化学习是一种无须人工提供训练数据自行与环... 路径规划是人工智能领域的一个经典问题,在国防军事、道路交通、机器人仿真等诸多领域有着广泛应用,然而现有的路径规划算法大多存在着环境单一、离散的动作空间、需要人工构筑模型的问题。强化学习是一种无须人工提供训练数据自行与环境交互的机器学习方法,深度强化学习的发展更使得其解决现实问题的能力得到进一步提升,本文将深度强化学习的DDPG(Deep Deterministic Policy Gradient)算法应用到路径规划领域,完成了连续空间、复杂环境的路径规划。 展开更多
关键词 路径规划 深度强化学习 DDPG actorcritic 连续动作空间
下载PDF
Multi-Agent Deep Reinforcement Learning for Cross-Layer Scheduling in Mobile Ad-Hoc Networks
2
作者 Xinxing Zheng Yu Zhao +1 位作者 Joohyun Lee Wei Chen 《China Communications》 SCIE CSCD 2023年第8期78-88,共11页
Due to the fading characteristics of wireless channels and the burstiness of data traffic,how to deal with congestion in Ad-hoc networks with effective algorithms is still open and challenging.In this paper,we focus o... Due to the fading characteristics of wireless channels and the burstiness of data traffic,how to deal with congestion in Ad-hoc networks with effective algorithms is still open and challenging.In this paper,we focus on enabling congestion control to minimize network transmission delays through flexible power control.To effectively solve the congestion problem,we propose a distributed cross-layer scheduling algorithm,which is empowered by graph-based multi-agent deep reinforcement learning.The transmit power is adaptively adjusted in real-time by our algorithm based only on local information(i.e.,channel state information and queue length)and local communication(i.e.,information exchanged with neighbors).Moreover,the training complexity of the algorithm is low due to the regional cooperation based on the graph attention network.In the evaluation,we show that our algorithm can reduce the transmission delay of data flow under severe signal interference and drastically changing channel states,and demonstrate the adaptability and stability in different topologies.The method is general and can be extended to various types of topologies. 展开更多
关键词 Ad-hoc network cross-layer scheduling multi agent deep reinforcement learning interference elimination power control queue scheduling actorcritic methods markov decision process
下载PDF
A new noise network and gradient parallelisation‐based asynchronous advantage actor‐critic algorithm
3
作者 Zhengshun Fei Yanping Wang +3 位作者 Jinglong Wang Kangling Liu Bingqiang Huang Ping Tan 《IET Cyber-Systems and Robotics》 EI 2022年第3期175-188,共14页
Asynchronous advantage actor‐critic(A3C)algorithm is a commonly used policy opti-mization algorithm in reinforcement learning,in which asynchronous is parallel inter-active sampling and training,and advantage is a sa... Asynchronous advantage actor‐critic(A3C)algorithm is a commonly used policy opti-mization algorithm in reinforcement learning,in which asynchronous is parallel inter-active sampling and training,and advantage is a sampling multi‐step reward estimation method for computing weights.In order to address the problem of low efficiency and insufficient convergence caused by the traditional heuristic exploration of A3C algorithm in reinforcement learning,an improved A3C algorithm is proposed in this paper.In this algorithm,a noise network function,which updates the noise tensor in an explicit way is constructed to train the agent.Generalised advantage estimation(GAE)is also adopted to describe the dominance function.Finally,a new mean gradient parallelisation method is designed to update the parameters in both the primary and secondary networks by summing and averaging the gradients passed from all the sub‐processes to the main process.Simulation experiments were conducted in a gym environment using the PyTorch Agent Net(PTAN)advanced reinforcement learning library,and the results show that the method enables the agent to complete the learning training faster and its convergence during the training process is better.The improved A3C algorithm has a better performance than the original algorithm,which can provide new ideas for sub-sequent research on reinforcement learning algorithms. 展开更多
关键词 ASYNCHRONOUS ADVANTAGE actorcritic (A3C) generalised ADVANTAGE estimation (GAE) PARALLELISATION reinforcement learning
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部