期刊文献+

基于请求与应答通信机制和局部注意力机制的多机器人强化学习路径规划方法

Multi-robot reinforcement learning path planning method based on request-response communication mechanism and local attention mechanism
下载PDF
导出
摘要 为降低多机器人在动态环境下路径规划的阻塞率,基于深度强化学习方法框架Actor-Critic,设计一种基于请求与应答通信机制和局部注意力机制的分布式深度强化学习路径规划方法(DCAMAPF)。在Actor网络,基于请求与应答通信机制,每个机器人请求视野内的其他机器人的局部观测信息和动作信息,进而规划出协同的动作策略。在Critic网络,每个机器人基于局部注意力机制将注意力权重动态地分配到在视野内成功应答的其他机器人局部观测和动作信息上。实验结果表明,与传统动态路径规划方法D*Lite、最新的分布式强化学习方法MAPPER和最新的集中式强化学习方法AB-MAPPER相比,DCAMAPF在离散初始化环境,阻塞率均值均约降低了6.91、4.97、3.56个百分点;在集中初始化环境下能更高效地避免发生阻塞,阻塞率均值均约降低了15.86、11.71、5.54个百分点,并减少占用的计算缓存。所提方法确保了路径规划的效率,适用于求解不同动态环境下的多机器人路径规划任务。 To reduce the blocking rate of multi-robot path planning in dynamic environments,a Distributed Communication and local Attention based Multi-Agent Path Finding(DCAMAPF)was proposed based on Actor-Critic deep reinforcement learning method framework,using request-response communication mechanism and local attention mechanism.In the Actor network,local observation and action information was requested by each robot from other robots in its field of view based on the request-response communication mechanism,and a coordinated action strategy was planned accordingly.In the Critic network,attention weights were dynamically allocated by each robot to the local observation and action information of other robots that had successfully responded within its field of view based on the local attention mechanism.The experimental results showed that,the blocking rate was reduced by approximately 6.91,4.97,and 3.56 percentage points,respectively,in a discrete initialization environment,compared with traditional dynamic path planning methods such as D*Lite,the latest distributed reinforcement learning method MAPPER,and the latest centralized reinforcement learning method AB-MAPPER(Attention and BicNet based MAPPER);in a centralized initialization environment,the mean blocking rate was reduced by approximately 15.86,11.71 and 5.54 percentage points;while the occupied computing cache was also reduced.Therefore,the proposed method ensures the efficiency of path planning and is applicable for solving multi-robot path planning tasks in different dynamic environments.
作者 邓辅秦 官桧锋 谭朝恩 付兰慧 王宏民 林天麟 张建民 DENG Fuqin;GUAN Huifeng;TAN Chaoen;FU Lanhui;WANG Hongmin;LAM Tinlun;ZHANG Jianmin(School of Intelligent Manufacturing,Wuyi University,Jiangmen Guangdong 529000,China;Shenzhen Institute of Artifical Intelligence and Robotics for Society,The Chinese University of Hong Kong(Shenzhen),Shenzhen Guangdong 518000,China;Shenzhen 3irobotix Company Limited,Shenzhen Guangdong 518000,China)
出处 《计算机应用》 CSCD 北大核心 2024年第2期432-438,共7页 journal of Computer Applications
基金 国家重点研发计划项目(2020YFB1313300) 深圳市科技计划项目(KQTD2016113010470345) 深圳市人工智能与机器人研究院探索性研究项目(AC01202101103) 五邑大学横向课题(33520098)。
关键词 多机器人路径规划 深度强化学习 注意力机制 通信 动态环境 multi-agent path finding deep reinforcement learning attention mechanism communication dynamic environment
  • 相关文献

参考文献5

二级参考文献45

  • 1赵真明,孟正大.基于加权A~*算法的服务型机器人路径规划[J].华中科技大学学报(自然科学版),2008,36(S1):196-198. 被引量:32
  • 2戴博,肖晓明,蔡自兴.移动机器人路径规划技术的研究现状与展望[J].控制工程,2005,12(3):198-202. 被引量:75
  • 3PARKER L E. Multiple mobile robot systems [ M]//Springer Hand- book of Robotics. Berlin: Springer, 2005:921-941.
  • 4CHARKROBORTY J, MUKHOPADHYAY S. A robust cooperative multi-robot path-planning in noisy environment [ C]// Proceedings of the 2010 IEEE International Conference on Industrial and Infor- mation Systems. Piscataway: IEEE, 2010:626-631.
  • 5JARADAT M, GARIBEH M H, FEILAT E A. Dynamic motion plan- ning for autonomous mobile robot using fuzzy potential field [ C]// Proceedings of the 6tb International Symposium on Meehatronies and Its Applications. Piseataway: IEEE, 2009:24-26.
  • 6GHATEE M, MOHADES A. Motion planning in order to optimize the length and clearance applying a Hopfield neural network [ J]. Expert Systems with Applications, 2009, 36(3): 4688 -4695.
  • 7BARTO A G, MAHADEVEN S. Recent advance in hierarchical reinforcement learning [ J]. Discrete Event Dynamic Systems, 2003, 13(4): 341 -379.
  • 8SABATFIN L, SECCHI C, FANTUZZI C. Arbitrarily shaped for- mations of mobile robots: artificial potential fields and coordinate transformation [ J]. Autonomous Robots, 2011, 30 (4) : 385 - 397.
  • 9KHATIB O. Real-time obstacle avoidance for manipulators and mo- bile robots [ C]//Proceedings of the 1985 IEEE International Con- ference on Robotics and Automation. Piseataway: IEEE, 1985, 2: 500 - 505.
  • 10LIANG T. A speedup convergent method for multi-Agent reinforce- ment learning [ C]// Proceedings of the 2009 International Confer- ence on Information Engineering and Computer Science. Piscat- away: IEEE, 2009:1-4.

共引文献125

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部