摘要
后门攻击是针对深度神经网络模型的一种隐蔽安全威胁,在智能信息系统安全性测试等方面具有重要的研究价值。现有的字符级后门攻去存在两方面的问题:当被毒化的训练样本的源标签与目标标签一致时,后门攻击的效果不佳;插入的触发器与上下文相关性不强,会破坏原始输入的语义和流畅性。为了解决上述问题,提出了一种基于篡改训练数据的词级文本后门攻击方法。通过对抗扰动技术或隐藏重要词技术篡改少部分训练数据,使目标模型更容易学习到后门特征;在触发器的生成和添加部分,利用义原库向被攻击句子中添加相关性强的触发器。在标签一致的条件下,通过在2个基准模型上的大量实验,证明了所提出的攻去可以达到90%以上的成功率,并能生成更高质量的后门示例,其性能明显优于基线方法。
As a kind of insidious security threat against deep neural network models,research on backdoor attacks has great values in the security testing of intelligent information systems.The existing word-level backdoor attacks have two problems:Backdoor attacks do not work well when the source labels of the poisoned training samples are consistent with the target la-bels;The inserted triggers are context free,so that the semantics and fluency of the original inputs may be destroyed.To solve the above problems,a word-level text backdoor attack method was proposed through tampering with training data.Firstly,a few training samples.were tampered by the adversarial perturbation(AD)technique or hiding important words(HIW)technique to make the target model learn the backdoor features more easily;Second-ly,the sememe library was used to add highly relevant triggers to the attacked sentences.Through extensive experiments on two benchmarks under the label-consistent condition,the proposed attack achieved more than 90%attack success rate,and generated backdoor exampies with higher quality,which were obviously better than the baselines approach.
作者
邵堃
杨俊安
SHAO Kun;YANG Jun'an(College of Electronic Engineering,National University of Defense Technology,Hefei 230037,China)
出处
《信息对抗技术》
2022年第1期81-89,共9页
Information Countermeasures Technology
关键词
深度神经网络
自然语言处理
对抗机器学习
后门攻击
deep neural networks
natural language processing
adversarial machine learning
backdoor attacks