期刊文献+

基于自适应攻击强度的对抗训练方法

Adversarial training method with adaptive attack strength
下载PDF
导出
摘要 深度神经网络(DNN)易受对抗样本攻击的特性引发了人们对人工智能系统安全性和可靠性的重大关切,其中对抗训练是增强对抗鲁棒性的一种有效方式。针对现有方法使用固定的对抗样本生成策略但存在忽视对抗样本生成阶段对对抗训练重要性的问题,提出一种基于自适应攻击强度的对抗训练方法。首先,将干净样本和对抗样本输入模型得到输出;然后,计算干净样本和对抗样本模型输出的差异;最后,衡量该差异与上一时刻差异的变化情况,并自动调整对抗样本强度。对三个基准数据集的全面实验结果表明,相较于基准方法投影梯度下降的对抗训练(PGD-AT),该方法在三个基准数据集的AA(AutoAttack)攻击下鲁棒精度分别提升1.92、1.50和3.35个百分点,且所提出方法在鲁棒性和自然准确率方面优于最先进的防御方法可学习攻击策略的对抗训练(LAS-AT)。此外,从数据增强角度看,该方法可以有效解决对抗训练这种特殊数据增强方式中增广效果随训练进展会不断下降的问题。 The vulnerability of deep neural networks to adversarial attacks has raised significant concerns about the security and reliability of artificial intelligence systems.Adversarial training is an effective approach to enhance adversarial robustness.To address the issue that existing methods adopt fixed adversarial sample generation strategies but neglect the importance of the adversarial sample generation phase for adversarial training,an adversarial training method was proposed based on adaptive attack strength.Firstly,the clean sample and the adversarial sample were input into the model to obtain the output.Then,the difference between the model outputs of the clean sample and the adversarial sample was calculated.Finally,the change of the difference compared with the previous moment was measured to automatically adjust the strength of the adversarial sample.Comprehensive experimental results on three benchmark datasets demonstrate that compared with the baseline method Adversarial Training with Projected Gradient Descent(PGD-AT),the proposed method improves the robust precision under AA(AutoAttack)attack by 1.92,1.50 and 3.35 percentage points on three benchmark datasets,respectively,and the proposed method outperforms the state-of-the-art defense method Adversarial Training with Learnable Attack Strategy(LAS-AT)in terms of robustness and natural accuracy.Furthermore,from the perspective of data augmentation,the proposed method can effectively address the problem of diminishing augmentation effect during adversarial training.
作者 陈彤 位纪伟 何仕远 宋井宽 杨阳 CHEN Tong;WEI Jiwei;HE Shiyuan;SONG Jingkuan;YANG Yang(School of Computer Science and Engineering,University of Electronic Science and Technology of China,Chengdu Sichuan 611731,China)
出处 《计算机应用》 CSCD 北大核心 2024年第1期94-100,共7页 journal of Computer Applications
基金 国家自然科学基金资助项目(U20B2063,62220106008,62306067) 中国博士后科学基金资助项目(2022M720660)。
关键词 对抗训练 对抗样本 对抗防御 适应攻击强度 深度学习 图像分类 人工智能安全 adversarial training adversarial example adversarial defense adaptive attack strength deep learning image classification artificial intelligence security
  • 相关文献

参考文献1

二级参考文献3

共引文献9

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部