基于请求与应答通信机制和局部注意力机制的多机器人强化学习路径规划方法

Multi-robot reinforcement learning path planning method based on request-response communication mechanism and local attention mechanism

下载PDF

导出

摘要为降低多机器人在动态环境下路径规划的阻塞率,基于深度强化学习方法框架Actor-Critic,设计一种基于请求与应答通信机制和局部注意力机制的分布式深度强化学习路径规划方法(DCAMAPF)。在Actor网络,基于请求与应答通信机制,每个机器人请求视野内的其他机器人的局部观测信息和动作信息,进而规划出协同的动作策略。在Critic网络,每个机器人基于局部注意力机制将注意力权重动态地分配到在视野内成功应答的其他机器人局部观测和动作信息上。实验结果表明,与传统动态路径规划方法D*Lite、最新的分布式强化学习方法MAPPER和最新的集中式强化学习方法AB-MAPPER相比,DCAMAPF在离散初始化环境,阻塞率均值均约降低了6.91、4.97、3.56个百分点;在集中初始化环境下能更高效地避免发生阻塞,阻塞率均值均约降低了15.86、11.71、5.54个百分点,并减少占用的计算缓存。所提方法确保了路径规划的效率,适用于求解不同动态环境下的多机器人路径规划任务。 To reduce the blocking rate of multi-robot path planning in dynamic environments,a Distributed Communication and local Attention based Multi-Agent Path Finding(DCAMAPF)was proposed based on Actor-Critic deep reinforcement learning method framework,using request-response communication mechanism and local attention mechanism.In the Actor network,local observation and action information was requested by each robot from other robots in its field of view based on the request-response communication mechanism,and a coordinated action strategy was planned accordingly.In the Critic network,attention weights were dynamically allocated by each robot to the local observation and action information of other robots that had successfully responded within its field of view based on the local attention mechanism.The experimental results showed that,the blocking rate was reduced by approximately 6.91,4.97,and 3.56 percentage points,respectively,in a discrete initialization environment,compared with traditional dynamic path planning methods such as D*Lite,the latest distributed reinforcement learning method MAPPER,and the latest centralized reinforcement learning method AB-MAPPER(Attention and BicNet based MAPPER);in a centralized initialization environment,the mean blocking rate was reduced by approximately 15.86,11.71 and 5.54 percentage points;while the occupied computing cache was also reduced.Therefore,the proposed method ensures the efficiency of path planning and is applicable for solving multi-robot path planning tasks in different dynamic environments.

作者邓辅秦官桧锋谭朝恩付兰慧王宏民林天麟张建民 DENG Fuqin;GUAN Huifeng;TAN Chaoen;FU Lanhui;WANG Hongmin;LAM Tinlun;ZHANG Jianmin(School of Intelligent Manufacturing,Wuyi University,Jiangmen Guangdong 529000,China;Shenzhen Institute of Artifical Intelligence and Robotics for Society,The Chinese University of Hong Kong(Shenzhen),Shenzhen Guangdong 518000,China;Shenzhen 3irobotix Company Limited,Shenzhen Guangdong 518000,China)

机构地区五邑大学智能制造学部香港中文大学(深圳)深圳市人工智能与机器人研究院深圳市杉川机器人有限公司

出处《计算机应用》 CSCD 北大核心 2024年第2期432-438,共7页 journal of Computer Applications

基金国家重点研发计划项目(2020YFB1313300) 深圳市科技计划项目(KQTD2016113010470345) 深圳市人工智能与机器人研究院探索性研究项目(AC01202101103) 五邑大学横向课题(33520098)。

关键词多机器人路径规划深度强化学习注意力机制通信动态环境 multi-agent path finding deep reinforcement learning attention mechanism communication dynamic environment

分类号 TP242 [自动化与计算机技术—检测技术与自动化装置]

引文网络
相关文献

参考文献5

1郑延斌,李波,安德宇,李娜.基于分层强化学习及人工势场的多Agent路径规划方法[J].计算机应用,2015,35(12):3491-3496. 被引量：14
2祁玄玄,黄家骏,曹建安.基于改进A^*算法的无人车路径规划[J].计算机应用,2020,40(7):2021-2027. 被引量：41
3王维,裴东,冯璋.改进A~*算法的移动机器人最短路径规划[J].计算机应用,2018,38(5):1523-1526. 被引量：73
4邓晖奕,李勇振,尹奇跃.引入通信与探索的多智能体强化学习QMIX算法[J].计算机应用,2023,43(1):202-208. 被引量：4
5XU Can,ZHAO WanZhong,CHEN QingYun,WANG ChunYan.An actor-critic based learning method for decision-making and planning of autonomous vehicles[J].Science China(Technological Sciences),2021,64(5):984-994. 被引量：3

二级参考文献45

1赵真明,孟正大.基于加权A~*算法的服务型机器人路径规划[J].华中科技大学学报（自然科学版）,2008,36(S1):196-198. 被引量：32
2戴博,肖晓明,蔡自兴.移动机器人路径规划技术的研究现状与展望[J].控制工程,2005,12(3):198-202. 被引量：75
3PARKER L E. Multiple mobile robot systems [ M]//Springer Hand- book of Robotics. Berlin: Springer, 2005:921-941.
4CHARKROBORTY J, MUKHOPADHYAY S. A robust cooperative multi-robot path-planning in noisy environment [ C]// Proceedings of the 2010 IEEE International Conference on Industrial and Infor- mation Systems. Piscataway: IEEE, 2010:626-631.
5JARADAT M, GARIBEH M H, FEILAT E A. Dynamic motion plan- ning for autonomous mobile robot using fuzzy potential field [ C]// Proceedings of the 6tb International Symposium on Meehatronies and Its Applications. Piseataway: IEEE, 2009:24-26.
6GHATEE M, MOHADES A. Motion planning in order to optimize the length and clearance applying a Hopfield neural network [ J]. Expert Systems with Applications, 2009, 36(3): 4688 -4695.
7BARTO A G, MAHADEVEN S. Recent advance in hierarchical reinforcement learning [ J]. Discrete Event Dynamic Systems, 2003, 13(4): 341 -379.
8SABATFIN L, SECCHI C, FANTUZZI C. Arbitrarily shaped for- mations of mobile robots: artificial potential fields and coordinate transformation [ J]. Autonomous Robots, 2011, 30 (4) : 385 - 397.
9KHATIB O. Real-time obstacle avoidance for manipulators and mo- bile robots [ C]//Proceedings of the 1985 IEEE International Con- ference on Robotics and Automation. Piseataway: IEEE, 1985, 2: 500 - 505.
10LIANG T. A speedup convergent method for multi-Agent reinforce- ment learning [ C]// Proceedings of the 2009 International Confer- ence on Information Engineering and Computer Science. Piscat- away: IEEE, 2009:1-4.

共引文献125

1王铎,杜峰,关志伟,赵彪,刘云鹏.狭窄紧凑环境下智能小车SLAM导航实验[J].天津职业技术师范大学学报,2022,32(4):29-34. 被引量：2
2李宪强,马戎,张伸,侯砚泽,裴毅飞.蚁群算法的改进设计及在航迹规划中的应用[J].航空学报,2020(S02):213-219. 被引量：33
3霍凤财,迟金,黄梓健,任璐,孙勤江,陈建玲.移动机器人路径规划算法综述[J].吉林大学学报（信息科学版）,2018,36(6):639-647. 被引量：148
4闫伟,史洪玮.网络数据多信道传输路径规划方法研究[J].计算机仿真,2016,33(8):284-287. 被引量：6
5许建国,张佳.物联网数据并行传输路径预测仿真[J].计算机仿真,2018,35(1):172-175. 被引量：5
6张国栋,陈金鑫,吴鹏飞.基于环境建模的USV轨迹规划技术[J].指挥控制与仿真,2018,40(5):86-93. 被引量：5
7王素琴,王飞,袁建平,陈晓龙,陈显龙.基于双向RRT算法的管线路径规划及建模仿真[J].太原理工大学学报,2018,49(6):839-845. 被引量：10
8赵健,张阳.基于典型栅格地图的代价地图改进方法[J].机械与电子,2018,36(12):73-76. 被引量：1
9赵广复,方加娟.基于蚁群优化和离策略学习的机器人路径规划[J].长春师范大学学报,2019,38(4):19-23. 被引量：1
10沈煜航,李家胤,李甜.以互联网+思维改进地图导航寻路系统探讨[J].电脑知识与技术,2019,15(4X):195-197. 被引量：1

1张邦成,单玉升,赵航,董雷,尹晓静.汽车白车身点焊作业多机器人路径规划研究[J].组合机床与自动化加工技术,2024(2):51-56.
2王榆,陈凯,周云婷.配电架空线路自动化清洗机器人路径规划仿真[J].计算机仿真,2023,40(12):128-132. 被引量：1
3李昊楠,毛剑琳,张凯翔,李大焱,王妮娅.一种基于安全区间的多机器人路径k鲁棒规划算法[J].仪器仪表学报,2023,44(10):274-282. 被引量：1
4李思博,臧兆祥,吕相霖.一种基于双经验池优先采样的深度强化学习算法[J].长江信息通信,2023,36(11):73-76.
5秦天为,冯云剑.基于Actor-Critic自适应PID的钢筋套丝头跟踪检测控制系统研究[J].工业控制计算机,2024,37(2):75-77.
6张柄汉,王琛,彭兆涛,张夷斋,张帆.一种面向空间非合作目标的强化学习多臂协同俘获策略研究[J].宇航学报,2023,44(12):1934-1943.
7宋紫阳,李军怀,王怀军,苏鑫,于蕾.基于路径模仿和SAC强化学习的机械臂路径规划算法[J].计算机应用,2024,44(2):439-444. 被引量：1
8刘丰瑞,颜格,张晓龙,张文明,王国鹏.基于深度强化学习的动基座双自由度系统动力学控制方法[J].动力学与控制学报,2023,21(10):26-33.
9刘道华,魏丁二,宣贺君,余长鸣,寇丽博.一种改进的双深度Q网络服务功能链部署算法[J].西安电子科技大学学报,2024,51(1):52-59. 被引量：1
10秦承志,朱良君,申申,吴彤,肖桂荣,吴升,陈芸芝,汪小钦,冯险峰,朱阿兴,陆锋.基于流域系统模拟一情景优化的精细治理决策支持方法[J].地理学报,2024,79(1):58-75.

计算机应用

2024年第2期

浏览历史

内容加载中请稍等...

基于请求与应答通信机制和局部注意力机制的多机器人强化学习路径规划方法

参考文献5

二级参考文献45

共引文献125

相关作者

相关机构

相关主题

浏览历史