期刊文献+

基于元生成内在奖励的机器人操作技能学习方法 被引量:1

A Meta Generactive Instinsic Reward Based Robot Manipulation Skill Learning
下载PDF
导出
摘要 针对稀疏奖励下,复杂任务学习效率低的问题,在离线策略(off-policy)强化学习思想基础上,提出了元生成内在奖励算法(meta generative intrinsic reward, MGIR),并将其应用在机器人操作技能学习问题求解方面。具体步骤为先使用一个可将复杂任务分解为多个子任务的元生成内在奖励框架,对子任务进行能力评价;再引入生成内在奖励模块,将智能体探索得到状态的新颖性作为内在奖励,并联合环境奖励共同指导智能体完成对环境的探索和特定任务的学习;最后,在MuJoCo仿真环境Fetch中对离线策略强化学习进行对比实验。实验结果表明,无论是在训练效率还是在成功率方面,提出的元生成内在奖励算法均表现较好。 To address the problem of low learning efficiency for complex tasks under sparse rewards,a meta generative intrinsic reward(MGIR)algorithm was proposed based on the idea of off policy reinforcement learning.And it has been applied to the problem solving of robot operation skills learning.The specific steps were to first use a meta generated intrinsic reward framework that can decompose complex tasks into multiple subtasks,and evaluated the ability of subtasks.Then,an internal reward module was introduced to generate the novelty of the state explored by the agent as an internal reward.And jointly guided intelligent agents to explore the environment and learn specific tasks through environmental rewards.Finally,comparative experiments were conducted on offline strategy reinforcement learning in the MuJoCo simulation environment Fetch.The experimental results showed that the proposed meta-generated intrinsic reward algorithm performs better both in terms of training efficiency and success rate.
作者 吴培良 渠有源 李瑶 陈雯柏 高国伟 WU Pei-liang;QU You-yuan;LI Yao;CHEN Wen-bai;GAO Guo-wei(School of Information science and Engineering,Yanshan University,Qinhuangdao,Heibei 066004,China;The Key Laboratory for Computer Virtual Technology and System Integration of Hebei Province,Qinhuangdao,Hebei 066004,China;School of Automation,Beijing Information Science and Technology University,Beijing 100192,China)
出处 《计量学报》 CSCD 北大核心 2023年第6期923-930,共8页 Acta Metrologica Sinica
基金 国家重点研发计划(2018YFB1308300) 国家自然科学基金(62276028,U20A20167) 北京市自然科学基金(4202026) 河北省自然科学基金(F202103079) 河北省创新能力提升计划(22567626H)。
关键词 计量学 机器人操作技能学习 稀疏奖励 强化学习 元学习 生成内在奖励 metrology robot operation skills learning sparse reward reinforcement learning meta learning generative intrinsic reward
  • 相关文献

参考文献1

二级参考文献3

共引文献2

同被引文献17

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部