期刊文献+

基于矫正理解的中文文本对抗样本生成方法 被引量:1

Method for Generating Chinese Text Adversarial Examples Based on Rectification Understanding
下载PDF
导出
摘要 自然语言处理技术在文本分类、文本纠错等任务中表现出强大性能,但容易受到对抗样本的影响,导致深度学习模型的分类准确性下降。防御对抗性攻击是对模型进行对抗性训练,然而对抗性训练需要大量高质量的对抗样本数据。针对目前中文对抗样本相对缺乏的现状,提出一种可探测黑盒的对抗样本生成方法WordIllusion。在数据处理与计算模块中,数据在删除标点符号后输入文本分类模型得到分类置信度,再将分类置信度输入CKSFM计算函数,通过计算比较cksf值选出句子中的关键词。在关键词替换模块中,利用字形嵌入空间和同音字库中的相似词语替换关键词并构建对抗样本候选序列,再将序列重新输入数据处理与计算模块计算cksf值,最终选择cksf值最高的数据作为最终生成的对抗样本。实验结果表明,WordIllusion方法生成的对抗样本在多数深度学习模型上的攻击成功率高于基线方法,在新闻分类场景的DPCNN模型上相比于CWordAttack方法最多高出41.73个百分点,且生成的对抗样本与原始文本相似度很高,具有较强的欺骗性与泛化性。 Natural Language Processing(NLP)technology has shown a strong performance in text classification,text error correction,and other tasks.However,it is vulnerable to the impact of adversarial examples,resulting in the decline of the classification accuracy of deep learning models.An effective approach to defending against adversarial attacks is applying adversarial training on the model.However,adversarial training requires a large number of high-quality adversarial example data.Currently,adversarial examples for the Chinese have not been investigated extensively.This study proposes a detectable black-box method called WordIllusion,which can successfully generate adversarial examples.In the data processing and calculation module,first,the data is input into the text classification model after the punctuation is deleted to achieve classification confidence.Next,the classification confidence is input into the CKSFM calculation function,and the keywords in the sentence are selected by calculating and comparing the cksf value.In the keyword replacement module,the keywords are first replaced with similar words in the font-embedded space and homophone library,and a candidate sequence of adversarial samples is built.Subsequently,the sequence is input into the data processing and calculation module to calculate the cksf value.Finally,the data with the highest cksf value is selected as the final generated adversarial samples.The experimental results show that the Attack Success Rate(ASR)of the adversarial samples generated by the WordIllusion method on most deep learning models is higher than that of the baseline methods.For the Deep Pyramid Convolutional Neural Networks(DPCNN)model in the news classification scenario,the ASR of the WordIllusion method is 41.73 percentage points higher than that of the CWordAttack method at the most.In addition,the generated adversarial samples are similar to the original text,which exhibits strong deception and generalization.
作者 王春东 孙嘉琪 杨文军 WANG Chundong;SUN Jiaqi;YANG Wenjun(School of Computer Science and Engineering,Tianjin University of Technology,Tianjin 300384,China;National Engineering Laboratory for Computer Virus Prevention and Control Technology,Tianjin 300384,China)
出处 《计算机工程》 CAS CSCD 北大核心 2023年第2期37-45,共9页 Computer Engineering
基金 国家自然科学基金联合基金项目(U1536122) 国家重点研发计划“科技助力经济2020”重点专项(SQ2020YFF0413781) 天津市科委重大专项(15ZXDSGX00030) 天津市教委科研计划(2021YJSB252)。
关键词 深度神经网络 自然语言处理 文本分类 对抗样本 矫正理解 deep neural network Natural Language Processing(NLP) text classification adversarial example rectification understanding
  • 相关文献

参考文献7

二级参考文献20

共引文献91

同被引文献13

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部