摘要
文本对抗防御旨在增强神经网络模型对不同对抗攻击的抵御能力,目前的文本对抗防御方法通常只能对某种特定对抗攻击有效,对于原理不同的对抗攻击效果甚微。为解决文本对抗防御方法的不足,提出一种文本对抗分布训练(TADT)方法,将TADT形式化为一个极小极大优化问题,其中内部最大化的目标是了解每个输入示例的对抗分布,外部最小化的目标是通过最小化预期损失来减小对抗示例的数量,并对基于梯度下降和同义词替换的攻击方法进行研究。在2个文本分类数据集上的实验结果表明,相比于DNE方法,在PWWS、GA、UAT等3种不同的对抗攻击下,TADT方法的准确率平均提升2%,相比于其他方法提升了10%以上,且在不影响干净样本准确率的前提下显著提升了模型的鲁棒性,并在各种对抗攻击下具有较高的准确率,展示了良好的泛化性能。
Text adversarial defense aims to enhance the resilience of neural network models against different adversarial attacks.The current text confrontation defense methods are usually only effective against certain specific confrontation attacks and have little effect on confrontation attacks with different principles.To address the deficiencies of existing textual adversarial defense methods and principles of adversarial attack methods,this paper proposes the Textual Adversarial Distribution Training(TADT)method and formalizes it as a minimax optimization problem.The goal of inner maximization is to learn the adversarial distribution of each input example.The goal of outer minimization is to reduce the number of adversarial examples by minimizing the expected loss.This paper mainly studies the attack method based on gradient descent and synonym replacement.The experimental results on two text classification datasets show that under three different adconfrontation attacks,Probability Weighted Word Saliency(PWWS),Genetic Attack(GA),and Unsupervised Adversarial Training(UAT),the accuracy of TADT is improved by an average of 2%compared with the latest Dirichlet Neighborhood Ensemble(DNE)method.Compared with other methods,the accuracy of TADT method is improved by more than 10%,and the accuracy of clean samples is not affected.On the premise of not affecting the accuracy of clean samples,TADT significantly improves the robustness of the model and has high accuracy under various adversarial attacks,showing good generalization performance.
作者
沈志东
岳恒宪
SHEN Zhidong;YUE Hengxian(School of Cyber Science and Engineering,Wuhan University,Wuhan 430000,China)
出处
《计算机工程》
CAS
CSCD
北大核心
2023年第9期16-22,共7页
Computer Engineering
基金
国家重点研发计划(2018YFC1604000)
湖北省重点研发计划项目(2022BAA041)。
关键词
文本对抗分布
对抗训练
变分自动编码器
梯度下降
蒙特卡罗采样
textual adversarial distribution
Adversarial Training(AT)
variational autoencoder
gradient descent
Monte Carlo sampling