期刊文献+

基于D2GA的逆强化学习算法

Inverse reinforcement learning algorithm based on D2GA
下载PDF
导出
摘要 针对传统生成对抗逆强化学习存在的专家样本获取困难以及生成样本利用率低的问题,提出一种基于事后经验回放策略HER的双鉴别器生成对抗D2GA逆强化学习算法。在该算法中,HER自动合成类专家的正样本,通过D2GA与强化学习方法柔性动作-评价SAC生成的负样本进行对抗性训练,基于所求解的最优奖励函数,利用SAC求解最优策略。将所提出的D2GA算法与经典的逆强化学习算法在Fetch机械臂环境中的4种任务进行了比较实验。结果表明:在没有可用演示数据的情况下,D2GA在相对少的回合数内完成任务的成功率可以达到理想性能,优于当前流行的逆强化学习算法。 Aiming at the difficulty in obtaining expert demonstrations and the low utilization rate of generated samples in the traditional generative adversarial reinforcement learning,a double discriminator generative adversarial(D2GA)inverse reinforcement learning algorithm based on hindsight experience replay(HER)is proposed.In this algorithm,HER automatically synthesizes positive expert-like samples,and conducts adversarial training with negative samples generated by D2GA and reinforcement learning algorithm soft actor-critic(SAC).Based on the solved optimal reward function,SAC is used to solve the optimal strategy.The proposed D2GA algorithm is compared with the classical inverse reinforcement algorithm on four tasks in the Fetch environment.The results show that the success rate of D2GA in completing the task in relatively few rounds can reach ideal performance without available demonstration data,which is better than the current popular inverse reinforcement learning algorithm.
作者 段成龙 袁杰 常乾坤 张宁宁 DUAN Cheng-long;YUAN Jie;CHANG Qian-kun;ZHANG Ning-ning(School of Electrical Engineering,Xinjiang University,Urumqi 830017,China)
出处 《计算机工程与科学》 CSCD 北大核心 2024年第11期2053-2062,共10页 Computer Engineering & Science
基金 国家自然科学基金(62263031) 新疆维吾尔自治区自然科学基金(2022D01C53)。
关键词 深度强化学习 事后经验回放 逆强化学习 生成对抗网络 deep reinforcement learning hindsight experience replay inverse reinforcement learning generative adversarial network
  • 相关文献

参考文献4

二级参考文献15

共引文献45

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部