期刊文献+

基于深度强化学习的虚拟机器人采摘路径避障规划 被引量:21

Obstacle Avoidance Planning of Virtual Robot Picking Path Based on Deep Reinforcement Learning
下载PDF
导出
摘要 针对采摘机器人在野外作业环境中,面临采摘任务数量多,目标与障碍物位置具有随机性和不确定性等问题,提出一种基于深度强化学习的虚拟机器人采摘路径避障规划方法,实现机器人在大量且不确定任务情况下的快速轨迹规划。根据机器人本体物理结构设定虚拟机器人随机运动策略,通过对比分析不同网络输入观测值的优劣,结合实际采摘行为设置环境观测集合,作为网络的输入;引入人工势场法目标吸引和障碍排斥的思想建立奖惩函数,对虚拟机器人行为进行评价,提高避障成功率;针对人工势场法范围斥力影响最短路径规划的问题,提出了一种方向惩罚避障函数设置方法,将障碍物范围惩罚转换为单一方向惩罚,通过建立虚拟机器人运动碰撞模型,分析碰撞结果选择性给予方向惩罚,进一步优化了规划路径长度,提高采摘效率;在Unity内搭建仿真环境,使用MLAgents组件建立分布式近端策略优化算法及其与仿真环境的交互通信,对虚拟机器人进行采摘训练。仿真实验结果显示,不同位置障碍物设置情况下虚拟机器人完成采摘任务成功率达96.7%以上。在200次随机采摘实验中,方向惩罚避障函数方法采摘成功率为97.5%,比普通奖励函数方法提高了11个百分点,采摘轨迹规划平均耗时0.64 s/次,相较于基于人工势场法奖励函数方法降低了0.45 s/次,且在连续变动任务实验中具有更高的适应性和鲁棒性。研究结果表明,本系统能够高效引导虚拟机器人在避开障碍物的前提下快速到达随机采摘点,满足采摘任务要求,为真实机器人采摘路径规划提供理论与技术支撑。 In the field environment,picking robots are faced with the problems of a large number of picking tasks,randomness and uncertainty in the positions of targets and obstacles,and so on.Traditional picking path planning methods usually use kinematics equations combined with the shortest path algorithm to solve them,while takes a lot of time to calculate in each planning.In order to improve the efficiency of trajectory planning to adapt to the field picking environment,a virtual robot picking path planning method based on deep reinforcement learning was proposed.Firstly,the virtual robot random action strategies were set according to the real robot physical structure,and the environment observation set was rationally set as the input of the network by analyzing the actual picking behavior.Establishing reward function with the reference to the idea of target attraction and obstacle rejection in artificial potential field method,which was used to evaluate the behavior of virtual robots and improve the success rate of obstacle avoidance.Aiming at the problem that the range repulsion of the artificial potential field method affected the shortest path planning,a directional penalty obstacle avoidance function setting method was proposed,which converted the obstacle range penalty into a single direction penalty.Besides,by establishing a virtual robot motion collision model,the direction penalties were giving selectively by analysis results of the model.Finally,a simulation environment in Unity was built,and the distributed proximal policy optimization algorithm was used to train the virtual robot.The simulation experiment results showed that the success rate of the virtual robot in completing the picking task was over 96.7%under the condition of obstacles in different positions.In 200 random picking experiments,the directional penalty obstacle avoidance function method had a picking success rate of 97.5%,which was 11 percentage points higher than the ordinary reward function method,and the picking trajectory planning took an average of 0.64 s/time,which was 0.45 s/time shorter than the artificial potential field method.The research results showed that the system can efficiently guide virtual robots to quickly reach random picking points under the premise of avoiding obstacles,and met the requirements of picking tasks,which provided theoretical and technical support for real robot picking path planning.
作者 熊俊涛 李中行 陈淑绵 郑镇辉 XIONG Juntao;LI Zhonghang;CHEN Shumian;ZHENG Zhenhui(College of Mathematics and Informatics, South China Agricultural University, Guangzhou 510642, China)
出处 《农业机械学报》 EI CAS CSCD 北大核心 2020年第S02期1-10,共10页 Transactions of the Chinese Society for Agricultural Machinery
基金 国家自然科学基金项目(32071912) 广东省自然科学基金项目(2018A030313330) 广州市科技计划项目(202002030423) 国家级大学生创新创业训练计划项目(201910564033)。
关键词 采摘机器人 路径规划 避障 深度强化学习 人工势场法 picking robot route planning obstacle avoidance deep reinforcement learning artificial potential field method
  • 相关文献

参考文献12

二级参考文献90

共引文献210

同被引文献229

引证文献21

二级引证文献47

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部