摘要
通用对抗攻击只需向任意输入添加一个固定的扰动序列,就可以成功混淆文本分类器,但是其会不加区分地攻击所有类别的文本样本,容易引起防御系统的注意。为了实现攻击的隐蔽性,文中提出了一种简单高效的类别区分式通用对抗攻击方法,突出对目标类别的文本样本有攻击效果,并尽量对非目标类别不产生影响。在白盒攻击的场景下,利用扰动序列在每个批次上的平均梯度搜索得到多个候选扰动序列,选择损失最小的扰动序列进行下一轮迭代,直到没有新的扰动序列产生。在4个公开的中英文数据集以及神经网络模型TextCNN和BiLSTM上进行了大量的实验,以评估所提方法的有效性,实验结果表明,该攻击方法可以实现对目标类别和非目标类别的区分式攻击,而且具有一定的迁移性。
The definition of universal adversarial attack is that the text classifiers can be successfully fooled by a fixed sequence of perturbations appended to any inputs.But textual examples from all classes are indiscriminately attacked by the existing UAA,which is easy to attract the attention of the defense system.For more stealth attack,a simple and efficient class discriminative universal adversarial attack method is proposed,which has an obvious attack effect on textual examples from the targeted classes and limited influence on the non-targeted classes.In the case of white-box attack,multiple candidate perturbation sequences are searched by using the average gradient of the perturbation sequence in each batch.The perturbation sequence with the smallest loss is selected for the next iteration until no new perturbation sequence is generated.Comprehensive experiments are conducted on four public Chinese and English datasets and TextCNN,BiLSTM to evaluate the effectiveness of the proposed method.Experimental results show that the proposed attack method can discriminatively attack the targeted and non-targeted classes,and has certain transferability.
作者
郝志荣
陈龙
黄嘉成
HAO Zhi-rong;CHEN Long;HUANG Jia-cheng(School of Computer Science and Technology,Chongqing University of Posts and Telecommunications,Chongqing 400065,China;School of Cyber Security and Information Law,Chongqing University of Posts and Telecommunications,Chongqing 400065,China)
出处
《计算机科学》
CSCD
北大核心
2022年第8期323-329,共7页
Computer Science
基金
重庆市教委重点合作项目(HZ2021008)。
关键词
通用对抗攻击
文本分类
类别区分式
深度学习
神经网络
Universal adversarial attack
Text classification
Class discriminative
Deep learning
Neural Networks