期刊文献+

拟双曲动量梯度的对抗深度强化学习研究 被引量:1

Research on Quasi-hyperbolic Momentum Gradient for Adversarial Deep Reinforcement Learning
下载PDF
导出
摘要 在深度强化学习(Deep Reinforcement Learning,DRL)中,智能体(agent)通过观察通道来观察环境状态。该观察可能包含对抗性攻击的干扰,也即对抗样本,使智能体选择了错误动作。生成对抗样本常用方法是采用随机梯度下降方法。提出使用拟双曲动量梯度算法(QHM)来生成对抗干扰,该方法能够充分利用以前的梯度动量来修正梯度下降方向,因而比采用随机梯度下降方法(SGD)在生成对抗样本上具有更高效率。同时借助这种攻击方法在鲁棒控制框架内训练了DRL鲁棒性。实验效果表明基于QHM训练方法的DRL在进行对抗性训练后,面对攻击和环境参数变化时的鲁棒性显著提高。 In Deep Reinforcement Learning(DRL),the agent observes the state of the environment through observation channels.The observation may include the interference of adversarial attacks,making the observation result far away from the real environment state.The engineering loss function with Quasi-Hyperbolic Momentum gradient algorithm(QHM)is used to further improve the attack,which will reduce the performance of the original DRL algorithm(for exam-ple,deep double-Q network,DDQN).Then this attack is used to improve the robustness of DRL within the robust control framework.After the adversarial training of QHM-based DRL,the robustness to the original environmental parameter changes is significantly improved.In addition,several adversarial attacks are compared.Compared with other adversarial attacks,QHM-based DRL has significantly improved attack and defense capabilities.
作者 马志豪 朱响斌 MA Zhihao;ZHU Xiangbin(College of Mathematics and Computer Science,Zhejiang Normal University,Jinhua,Zhejiang 321004,China)
出处 《计算机工程与应用》 CSCD 北大核心 2021年第24期90-99,共10页 Computer Engineering and Applications
关键词 深度强化学习 对抗性攻击 拟双曲动量梯度 损失函数 deep reinforcement learning adversarial attack quasi-hyperbolic momentum gradient loss function
  • 相关文献

同被引文献14

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部