摘要
研究了深度强化学习(DRL)方法在室内环境下移动机器人导航策略中的路径规划问题。针对外部奖励稀疏使得机器人难以完成导航任务的问题,设计了基于势能的外部奖励函数;针对机器人易陷入奖励局部极小值所引发的次优策略下最大奖励过早收敛问题,引入基于内在好奇心模块(ICM)内部奖励作为奖励增强信号,并结合近端策略优化(PPO)算法在ROS和Gazebo搭建的室内装修仿真环境下作对比实验。实验结果表明:添加了外部势能奖励函数和好奇心内部奖励的PPO模型在仿真环境中表现出了良好的性能。
Path planning problem of mobile robot navigation strategy using deep reinforcement learning(DRL)in indoor environment is studied.Aiming at the problem that the robot can not complete the navigation task due to the sparse external reward,the external reward function based on potential energy is designed.In order to solve the problem of premature convergence of maximum reward under suboptimal strategy caused by the tendency of robot to fall into local minimum of reward,the internal reward based on intrinsic curiosity module(ICM)is introduced as the signal of reward enhancement,and combined with the proximal policy optimization(PPO)algorithm,comparison experiment are carried out in the simulation environment of interior decoration built by ROS and Gazebo.Experimental result shows that the PPO model with external potential energy reward function and internal curiosity reward has good performance in the simulation environment.
作者
朱林
赵东杰
徐茂
ZHU Lin;ZHAO Dongjie;XU Mao(Institute for Future,College of Automation,Qingdao University,Qingdao 266071,China;Shandong Key Laboratory of Industrial Control Technology,Qingdao 266071,China)
出处
《传感器与微系统》
CSCD
北大核心
2023年第1期38-42,共5页
Transducer and Microsystem Technologies
基金
国家自然科学基金资助项目(U1813202)。
关键词
深度强化学习
室内环境
移动机器人
外部奖励
内部奖励
deep reinforcement learning(DRL)
indoor environment
mobile robot
extrinsic rewards
internal rewards