摘要
现有基于深度强化学习的机械臂轨迹规划方法在未知环境中学习效率偏低,规划策略鲁棒性差。为了解决上述问题,提出了一种基于新型方位奖励函数的机械臂轨迹规划方法A-DPPO,基于相对方向和相对位置设计了一种新型方位奖励函数,通过降低无效探索,提高学习效率。将分布式近似策略优化(DPPO)首次用于机械臂轨迹规划,提高了规划策略的鲁棒性。实验证明相比现有方法,A-DPPO有效地提升了学习效率和规划策略的鲁棒性。
For the trajectory planning of robot manipulator in unknown environments,current deep reinforcement learning based methods often suffer from the low learning efficiency and low robustness of planning strategy.To overcome the problems above,a novel azimuth reward function based trajectory planning method called A-DPPO is proposed.A novel azimuth reward function based on relative orientation and relative position is designed to reduce the invalid explorations and improve the learning efficiency.Moreover,it is the first time that Distributed Proximal Policy Optimization(DPPO)is applied to the trajectory planning for robot manipulator to improve the robustness of planning strategy.Experimental results show that the proposed A-DPPO method can increase the learning efficiency,compared to the state-of-the-art methods,and improve the robustness of planning strategy greatly.
作者
李跃
邵振洲
赵振东
施智平
关永
LI Yue;SHAO Zhenzhou;ZHAO Zhendong;SHI Zhiping;GUAN Yong(College of Information Engineering,Capital Normal University,Beijing 100048,China;Beijing Key Laboratory of Light Industrial Robot and Safety Verification,Capital Normal University,Beijing 100048,China;Beijing Advanced Innovation Center for Imaging Technology,Capital Normal University,Beijing 100048,China)
出处
《计算机工程与应用》
CSCD
北大核心
2020年第2期226-232,共7页
Computer Engineering and Applications
基金
国家自然科学基金(No.61702348,No.61772351,No.61602326,No.61602324)
国家重点研发计划(No.2017YFB1303000,No.2017YFB1302800)
北京市科委项目(No.LJ201607)
北京市教委科研计划一般项目(No.KM201710028017)
科技创新服务能力建设-基本科研业务费(科研类)(No.025185305000)
首都师范大学青年科研创新团队
关键词
深度强化学习
机械臂
轨迹规划
方位奖励函数
deep reinforcement learning
robot manipulator
trajectory planning
azimuth reward function