基于策略融合及Spiking DRL的移动机器人路径规划方法

Mobile Robots’Path Planning Method Based on Policy Fusion and Spiking Deep Reinforcement Learning

下载PDF

导出

摘要深度强化学习(DRL)已被成功应用于移动机器人路径规划中,基于DRL的移动机器人路径规划算法适用于高维环境,是实现移动机器人自主学习的重要方法。而训练DRL模型需要大量的环境交互经验,这意味着更高的计算成本。此外,DRL算法的经验池容量有限,无法确保经验的有效利用。作为类脑计算重要工具之一的脉冲神经网络(Spiking Neural Networks,SNNs)以其独有的生物似真性,能同时融入时空信息,适用于机器人环境感知及控制。结合SNNs、卷积神经网络(CNNs)和策略融合,针对基于DRL的移动机器人路径规划算法进行研究,完成了以下工作:1)提出SCDDPG(SCDDP)算法。该算法利用CNNs对输入状态进行多通道特征提取,利用SNNs对提取的特征进行时空学习。2)在SCDDPG的基础上,提出SC2DDPG(SC2DDPG)算法。SC2DDPG通过设计状态约束策略对机器人运行状态进行约束,避免了不必要的环境探索,提升了SC2DDPG中DRL的收敛速度。3)在SCDDPG的基础上,提出了PFTDDPG(Policy Fusion and Transfer SCDDPG,PFTDDPG)算法。该算法采用分阶控制模式与DRL算法融合,针对环境中的楔形障碍物实施沿墙行走策略,并引入迁移学习对先验知识进行策略迁移。PFTDDPG算法不仅完成了单纯依靠RL不能完成的路径规划任务,还可以得到最优无碰路径。此外PFTDDPG提升了模型的收敛速度和路径规划性能。实验结果证明了所提出的3种路径规划算法的有效性,对比实验结果表明:在SpikeDDPG,SCDDPG,SC2DDPG和PFTDDPG算法中,PFTDDPG算法在路径规划成功率、训练收敛速度、规划路径长度等性能指标上表现最佳。本工作为移动机器人路径规划提出了新思路,丰富了DRL在移动机器人路径规划中的解决方案。 Deep reinforcement learning(DRL)has been applied to mobile robots’path planning successfully,and the DRL-based mobile robots’path planning methods are suitable for high-dimensional environments and stand as a crucial method for achieving autonomous learning in mobile robots.However,training DRL models requires a large amount of interacting experience with the environment,which leads to heavy computational cost.In addition,the limited memory capacity within DRL algorithms hinders the assurances of effective utilization of experiences.Spiking neural networks(SNNs),one of the main tools for brain-inspired computing,are suitable for robots’environmental perception and control with SNNs’unique bio-plausibility and the ability of incorporating spatio-temporal information simultaneously.In this paper,we combine SNNs,convolutional neural networks(CNNs),and policy fusion for DRL-based mobile robots’path planning,and have accomplished the following works:1)We propose the SCDDPG(spike convolutional DDPG,SCDDP)algorithm,which employs CNNs for multi-channel feature extraction of input states and SNNs for spatio-temporal features extracting.2)Based on SCDDPG and the designed state constraint policy,the SC2DDPG(State Constraint SCDDPG,SC2DDPG)algorithm is proposed to constrain the robot’s operation states,which avoids unnecessary environment exploration and improves the convergence speed of DRL model in SC2DDPG.3)Based on SCDDPG,the PFTDDPG(policy fusion and transfer SCDDPG,PFTDDPG)algorithm is proposed.The PFTDDPG implements the“wall-follow”policy to pass the wedge-shaped obstacles in the environment.Additionally,PFTDDPG incorporates transfer learning to transfer prior knowledge between policies in mobile robots’path planning.PFTDDPG not only completes path planning tasks that cannot be completed solely by RL,but also yields the optimal collision-free paths.Furthermore,PFTDDPG improves the convergence speed of the DRL model and the performance of the planed path.Experimental results validate the effectiveness of the proposed path planning algorithms.The comparison experimental results indicates that compared with SpikeDDPG,SCDDPG,SC2DDPG and PFTDDPG algorithms,the PFTDDPG algorithm achieves the best performance in the path planning success rate,training convergence speed,planning path length.This paper not only proposes new ideas for mobile robots’path planning,but also enriches the solution policy of DRL in mobile robots’path planning.

作者安阳王秀青赵明华 AN Yang;WANG Xiuqing;ZHAO Minghua(College of Computer and Cyber Security,Hebei Normal University,Shijiazhuang 050024,China;Hebei Provincial Key Laboratory of Network&Information Security,Shijiazhuang 050024,China;Hebei Provincial Engineering Research Center for Supply Chain Big Data Analytics&Data Security,Shijiazhuang 050024,China)

机构地区河北师范大学计算机与网络空间安全学院河北省网络与信息安全重点实验室河北省供应链大数据分析与数据安全工程研究中心

出处《计算机科学》 CSCD 北大核心 2024年第S02期59-69,共11页 Computer Science

基金国家自然科学基金面上项目(61673160,61175059) 河北省自然科学基金(F2018205102) 河北省高等学校科学技术研究重点项目(ZD2021063)

关键词深度强化学习脉冲神经网络卷积神经网络迁移学习移动机器人路径规划 Deep reinforcement learning Spiking neural networks Convolutional neural networks Transfer learning Mobile robot path planning

分类号 TP183 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

参考文献2

1B.K. Patle,Ganesh Babu L,Anish Pandey,D.R.K. Parhi,A. Jagadeesh.A review:On path planning strategies for navigation of mobile robot[J].Defence Technology（防务技术）,2019,15(4):582-606. 被引量：87
2吴京达,黄志宇,胡中旭,吕辰.人在回路的深度强化学习算法及其在自动驾驶智能决策中的应用[J].Engineering,2023(2):75-91. 被引量：5

二级参考文献14

1冯琦,周德云.基于微分进化算法的时间最优路径规划[J].计算机工程与应用,2005,41(12):74-75. 被引量：31
2TAN Guan-Zheng,HE Huan,SLOMAN Aaron.Ant Colony System Algorithm for Real-Time Globally Optimal Path Planning of Mobile Robots[J].自动化学报,2007,33(3):279-285. 被引量：26
3刘利强,于飞,戴运桃.基于蚁群算法的水下潜器三维空间路径规划[J].系统仿真学报,2008,20(14):3712-3716. 被引量：26
4陈岩,苏菲,沈林成.概率地图UAV航线规划的改进型蚁群算法[J].系统仿真学报,2009,21(6):1658-1662. 被引量：10
5李欣,朱大奇.基于人工势场法的自治水下机器人路径规划[J].上海海事大学学报,2010,31(2):35-39. 被引量：16
6高曼,刘以安,张强.优化蚁群算法在反舰导弹航路规划中的应用[J].计算机应用,2012,32(9):2530-2533. 被引量：16
7Fei Yan,Yi-Sha Liu,Ji-Zhong Xiao.Path Planning in Complex 3D Environments Using a Probabilistic Roadmap Method[J].International Journal of Automation and computing,2013,10(6):525-533. 被引量：14
8梁晓丹,李亮玉,武继刚,陈瀚宁.Mobile robot path planning based on adaptive bacterial foraging algorithm[J].Journal of Central South University,2013,20(12):3391-3400. 被引量：8
9TANG Xian-lun,LI La-mei,JIANG Bo-jie.Mobile robot SLAM method based on multi-agent particle swarm optimized particle filter[J].The Journal of China Universities of Posts and Telecommunications,2014,21(6):78-86. 被引量：7
10Prases K. MOHANTY Dayal R. PARHI.A new efficient optimal path planner for mobile robot based on Invasive Weed Optimization algorithm[J].Frontiers of Mechanical Engineering,2014,9(4):317-330. 被引量：2

共引文献90

1陈丽,陈洋,杨艳华.面向三维结构视觉检测的无人机覆盖路径规划[J].电子测量与仪器学报,2023,37(2):1-10. 被引量：9
2郝琨,邓晁硕,赵璐,刘永磊.基于区域搜索粒子群算法的机器人路径规划[J].电子测量与仪器学报,2022,36(12):126-135. 被引量：17
3布升强,梅淼,李琼琼,杨家富,王大明.森林防火机器人轨迹寻踪技术研究[J].森林工程,2020,36(3):44-52. 被引量：19
4徐兴毅,付丽霞,张勇,毛剑琳,郭宁.基于对角障碍检测和优化蚁群算法的路径规划[J].云南大学学报（自然科学版）,2020,42(4):648-655. 被引量：3
5陈娇,向建平,刘卿.基于方向和步长约束的安全A*算法[J].物流技术,2020,39(11):88-94. 被引量：1
6Bo Li,Zhi-peng Yang,Da-qing Chen,Shi-yang Liang,Hao Ma.Maneuvering target tracking of UAV based on MN-DDPG and transfer learning[J].Defence Technology（防务技术）,2021,17(2):457-466. 被引量：11
7张中伟,李俊兰,吴立辉,武照云.考虑能耗的制造车间单AGV路径规划研究[J].制造技术与机床,2021(3):118-122. 被引量：7
8路永鑫,魏云冰,赵启承,李思成.基于层次分析法和改进A^(*)算法的电力应急机器人路径规划[J].电力系统保护与控制,2021,49(9):82-89. 被引量：22
9姜延欢,杨永军,李新良,吴娅辉,刘渊,田森.轮式移动机器人自主导航计量评价现状[J].计测技术,2021,41(2):81-86. 被引量：2
10彭湘,向凤红,毛剑琳.一种未知环境下的移动机器人路径规划方法[J].小型微型计算机系统,2021,42(5):961-966. 被引量：14

1《科学与无神论》编辑部.《科学与无神论》投稿邮箱变更启事[J].科学与无神论,2024(5).
2黄夏馨.基于时空信息的深度伪造人脸检测[J].智能计算机与应用,2024,14(10):99-106.
3以基本培训为抓手推动党校高质量发展[J].党史纵横,2024(10).
4都田秀佳,张晓梅.我国3岁以下婴幼儿托育供需区域差异及“十四五”趋势分析——基于“七普”数据预测的2025年人口规模[J].中国国情国力,2024(9):55-60.
5陈伟.内蒙古档案教研服务中心档案数字化管理系统的设计与实现[J].移动信息,2024,46(10):72-74.
6林建华,费学军,吴杰,黄贤明.一种基于动态最小成本路径启发式算法的水冷壁爬壁机器人路径规划方法[J].无线互联科技,2024,21(20):21-25.
7张庭瑜,曾颖,李楠,黄洪钟.基于深度强化学习的航天器功率信号复合网络优化算法[J].系统工程与电子技术,2024,46(9):3060-3069.
8金亮,崔德文,史学峰.某SUV加速中方向盘抖动分析与控制[J].汽车零部件,2024(6):83-86.
9缪涛,杨生.船舶营运工况的艏倾问题[J].中国船检,2024(10):63-66.
10张志强,暴亚东.融合RF和CNN的异常流量检测算法[J].信息网络安全,2024(11):1655-1664.

计算机科学

2024年第S02期

浏览历史

内容加载中请稍等...

基于策略融合及Spiking DRL的移动机器人路径规划方法

参考文献2

二级参考文献14

共引文献90

相关作者

相关机构

相关主题

浏览历史