基于势能与好奇心机制的室内环境导航研究

Research on indoor environment navigation based on potentialenergy and curiosity mechanism

下载PDF

导出

摘要研究了深度强化学习(DRL)方法在室内环境下移动机器人导航策略中的路径规划问题。针对外部奖励稀疏使得机器人难以完成导航任务的问题,设计了基于势能的外部奖励函数;针对机器人易陷入奖励局部极小值所引发的次优策略下最大奖励过早收敛问题,引入基于内在好奇心模块(ICM)内部奖励作为奖励增强信号,并结合近端策略优化(PPO)算法在ROS和Gazebo搭建的室内装修仿真环境下作对比实验。实验结果表明:添加了外部势能奖励函数和好奇心内部奖励的PPO模型在仿真环境中表现出了良好的性能。 Path planning problem of mobile robot navigation strategy using deep reinforcement learning(DRL)in indoor environment is studied.Aiming at the problem that the robot can not complete the navigation task due to the sparse external reward,the external reward function based on potential energy is designed.In order to solve the problem of premature convergence of maximum reward under suboptimal strategy caused by the tendency of robot to fall into local minimum of reward,the internal reward based on intrinsic curiosity module(ICM)is introduced as the signal of reward enhancement,and combined with the proximal policy optimization(PPO)algorithm,comparison experiment are carried out in the simulation environment of interior decoration built by ROS and Gazebo.Experimental result shows that the PPO model with external potential energy reward function and internal curiosity reward has good performance in the simulation environment.

作者朱林赵东杰徐茂 ZHU Lin;ZHAO Dongjie;XU Mao(Institute for Future,College of Automation,Qingdao University,Qingdao 266071,China;Shandong Key Laboratory of Industrial Control Technology,Qingdao 266071,China)

机构地区青岛大学自动化学院未来研究院山东省工业控制技术重点实验室

出处《传感器与微系统》 CSCD 北大核心 2023年第1期38-42,共5页 Transducer and Microsystem Technologies

基金国家自然科学基金资助项目(U1813202)。

关键词深度强化学习室内环境移动机器人外部奖励内部奖励 deep reinforcement learning(DRL) indoor environment mobile robot extrinsic rewards internal rewards

分类号 TP249 [自动化与计算机技术—检测技术与自动化装置]

引文网络
相关文献

参考文献3

1朱大奇,颜明重.移动机器人路径规划技术综述[J].控制与决策,2010,25(7):961-967. 被引量：316
2朱广,霍跃华,栾庆磊,史艳琼.基于PPO算法优化的IoT环境温度预测研究[J].传感器与微系统,2021,40(4):33-36. 被引量：3
3Yun Zou,Qing-Zi Xing,Bai-Chuan Wang,Shu-Xin Zheng,Cheng Cheng,Zhong-Ming Wang,Xue-Wu Wang.Application of the asynchronous advantage actor–critic machine learning algorithm to real-time accelerator tuning[J].Nuclear Science and Techniques,2019,30(10):133-141. 被引量：2

二级参考文献74

1戴博,肖晓明,蔡自兴.移动机器人路径规划技术的研究现状与展望[J].控制工程,2005,12(3):198-202. 被引量：75
2罗泽举,宋丽红,薛宇峰,朱思铭.一类基于SVM/RBF的气象模型预测系统[J].计算机工程,2006,32(21):31-32. 被引量：11
3Hofner C, Schmidt G. Path planning and guidance techniques for an autonomous mobile robot[J]. Robotic and Autonomous Systems, 1995, 14(2): 199-212.
4Schmidt G, Hofner C. An advaced planning and navigation approach for autonomous cleaning robot operationa[C]. IEEE Int Conf Intelligent Robots System. Victoria, 1998: 1230-1235.
5Vasudevan C, Ganesan K. Case-based path planning for autonomous underwater vehicles[C]. IEEE Int Symposium on Intelligent Control. Columbus, 1994:160-165.
6Liu Y. Zhu S, Jin B, et al. Sensory navigation of autonomous cleaning robots[C]. The 5th World Conf on Intelligent Control Automation. Hangzhou, 2004: 4793- 4796.
7De Carvalho R N, Vidal H A, Vieira P, et al. Complete coverage path planning and guidance for cleaning robots[C]. IEEE Int Conf Industry Electrontics. Guimaraes, 1997: 677-682.
8Ram A, Santamaria J C. Continuous case-based reasoning[J]. Artificial Inteligence, 1997, 90(1/2): 25-77.
9Arleo A, Smeraldi E Gerstner W. Cognitive navigation based on non-uniform Gabor space sampling, unsupervised growing Networks, and reinforcement learning[J]. IEEE Trans on Neural Network, 2004, 15(3): 639-652.
10Fujimura K, Samet H. A hierarchical strategy for path planning among moving obstacles[J]. IEEE Trans on Robotic Automation, 1989, 5(1): 61-69.

共引文献318

1刘军,冯硕,任建华.移动机器人路径动态规划有向D~*算法[J].浙江大学学报（工学版）,2020,54(2):291-300. 被引量：23
2黄鲁,周非同.基于路径优化D^*Lite算法的移动机器人路径规划[J].控制与决策,2020,35(4):877-884. 被引量：21
3张凡,蔡涛,刘文达,范亚雷.基于改进JPS算法的电站巡检机器人路径规划[J].电子测量技术,2020,43(8):10-16. 被引量：6
4柯文德,蔡则苏,彭志平,钟秋波,朴松昊.一种混合路径规划方法在轮式机器人中的应用[J].计算机应用研究,2011,28(2):505-507. 被引量：5
5肖国宝,严宣辉.一种动态不确定环境中机器人路径规划方法[J].计算机系统应用,2012,21(4):92-98. 被引量：5
6董西增.知识经济时代中国石化工业面临的经营课题[J].金山企业管理,2000(1):34-40.
7胡建人,胡达人.捻绳股数与结构因子互换关系及其对自稳定性影响研究[J].包装工程,2000,21(3):9-10. 被引量：1
8周利坤,刘宏昭.自适应人工鱼群算法在清罐移动机器人路径规划中的应用[J].机械科学与技术,2012,31(7):1085-1089. 被引量：12
9曾明如,王从庆,刘公法,刘亮.基于元胞自动机的移动机器人路径规划[J].南昌大学学报（工科版）,2012,34(3):287-290. 被引量：5
10朱美强,李明,张倩.一类用于井下路径规划问题的Dyna_Q学习算法[J].工矿自动化,2012,38(12):71-76. 被引量：2

1胡智聪.老年人COVID-19疫苗加强针接种意愿影响因素研究[J].卫生软科学,2022,36(11):91-96. 被引量：1
2金圣玹,屠小明,王可均,戴宇东,张嵘嵘,林振平.基于孕妇家庭成员的潜在献血者招募策略研究[J].中国输血杂志,2022,35(6):640-643. 被引量：1
3尹玉莲,戴朝晖,蒋庆庆,陈林林,蒋志明,金静.沙库巴曲缬沙坦治疗DCM和ICM致HFrEF患者的临床疗效[J].湖南师范大学学报（医学版）,2022,19(5):41-44. 被引量：3
4司马双霖,黄岩,何科技,安东,袁辉,王亮.视觉语言导航研究进展[J].自动化学报,2023,49(1):1-14. 被引量：1
5刘佩林,陈祥,牛小明.无人系统自主性技术研究现状与发展趋势[J].兵工自动化,2022,41(12):61-65. 被引量：2
6魏楠,魏祥麟,范建华,薛羽,胡永扬.面向频谱接入深度强化学习模型的后门攻击方法[J].计算机科学,2023,50(1):351-361. 被引量：1
7尹凤仪,刘芳,吴向阳.基于改进人工势场法的多无人车编队路径规划[J].无人系统技术,2022,5(6):24-30. 被引量：5
8董蕊芳,王宇鹏,阚江明.基于改进ORB_SLAM2的机器人视觉导航方法[J].农业机械学报,2022,53(10):306-317. 被引量：5
9李晓凡,席浩哲,尹思佳,邸若彤,高强,秦景.基于人工势场法的机器人路径规划改进方法的研究[J].河北北方学院学报（自然科学版）,2022,38(11):7-14. 被引量：1
10孔玥,刘建东,李保伟,祁峥东,柏业超,何颖.基于改进入侵杂草算法的稀疏线性阵列综合方法[J].雷达与对抗,2022,42(4):30-33. 被引量：1

传感器与微系统

2023年第1期

浏览历史

内容加载中请稍等...

基于势能与好奇心机制的室内环境导航研究

参考文献3

二级参考文献74

共引文献318

相关作者

相关机构

相关主题

浏览历史