期刊文献+

基于原型网络的中文分类模型对抗样本生成

Adversarial Sample Generation for Chinese Classification Model Based on Prototypical Network
下载PDF
导出
摘要 对抗样本生成通过在原文本中添加不易察觉的扰动使深度学习模型产生错误输出,常用于检测文本分类模型的鲁棒性。现有对抗样本生成方法多数采用黑盒或白盒攻击,在生成对抗样本的过程中需要和受害模型交互,且攻击效果依赖于受害模型的结构和性能,通用性较差。面向中文文本的对抗样本生成方法使用的变换策略过于单一,无法生成多样化的中文对抗样本。针对这些问题,提出一种基于原型网络的对抗样本生成(AEGP)方法。在全面分析汉字结构特点和人类阅读习惯的基础上,设计8种可保持语义一致的中文文本变换策略。将卷积神经网络作为编码器,构建原型网络,利用同一类别下的其他文本辅助发现所需变换的文本片段。针对选择的文本片段应用文本变换策略,生成对抗样本。实验结果表明,AEGP方法具有较好的通用性,能生成多样化的对抗样本,且相比于基线方法,分类模型在AEGP方法生成的对抗样本上的准确率下降了9.21~32.64个百分点。 In adversarial sample generation,the deep learning model is triggered to add imperceptible perturbations to the original text,thereby producing an incorrect output which can subsequently be used to test the robustness of the text classification model against malicious attacks.Existing adversarial sample generation methods must interact with the victim model in launching mostly black-or white-box attacks.The effect of the attack depends on the attributes of the victim model,such as structure and performance,and thus the process is not sufficiently versatile.In addition,the transformation strategy used in the adversarial sample generation method for Chinese text is too simple to generate diverse adversarial examples.To address these issues,in this study,an adversarial sample generation method called AEGP is proposed for a Chinese text classification model.First,based on a comprehensive analysis of the structural characteristics of Chinese characters and human reading habits,eight Chinese text transformation strategies are designed to maintain consistent semantics.Subsequently,using convolutional neural networks as the encoder,a prototypical network is built,whereby other texts in the same category are used to determine the text fragments that need to be transformed.Finally,text transformation strategies are applied to the selected text fragments to generate adversarial samples.The experimental results demonstrate that AEGP has good generality in generating diverse adversarial samples.Compared with the baseline method,the accuracy of the classification model on the adversarial samples generated by AEGP dropped by 9.21-32.64 percentage points,demonstrating the sensitivity of the model to imperceptible perturbations.
作者 杨燕燕 谢明轩 曹江峡 王学宾 柳厅文 杜彦辉 YANG Yanyan;XIE Mingxuan;CAO Jiangxia;WANG Xuebin;LIU Tingwen;DU Yanhui(College of Information and Cyber Security,People's Public Security University of China,Beijing 100038,China;Institute of Information Engineering,Chinese Academy of Sciences,Beijing 100084,China;School of Cyber Security,University of Chinese Academy of Sciences,Beijing 100049,China)
出处 《计算机工程》 CAS CSCD 北大核心 2023年第8期54-62,共9页 Computer Engineering
基金 国家重点研发计划(2021YFB3100600) 中国科学院战略性先导科技专项(XDC02040400) 中国科学院青年创新促进会项目(2021153)。
关键词 对抗样本生成 分类模型 原型网络 文本表示 变换策略 adversarial sample generation classification model prototypical network text representation transformation strategy
  • 相关文献

参考文献2

二级参考文献7

共引文献25

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部