文本后门攻击与防御综述

Survey of Textual Backdoor Attack and Defense

下载PDF

导出

摘要深度神经网络的安全性和鲁棒性是深度学习领域的研究热点.以往工作主要从对抗攻击角度揭示神经网络的脆弱性,即通过构建对抗样本来破坏模型性能并探究如何进行防御.但随着预训练模型的广泛应用,出现了一种针对神经网络尤其是预训练模型的新型攻击方式——后门攻击.后门攻击向神经网络注入隐藏的后门,使其在处理包含触发器(攻击者预先定义的图案或文本等)的带毒样本时会产生攻击者指定的输出.目前文本领域已有大量对抗攻击与防御的研究,但对后门攻击与防御的研究尚不充分,缺乏系统性的综述.全面介绍文本领域后门攻击和防御技术.首先,介绍文本领域后门攻击基本流程,并从不同角度对文本领域后门攻击和防御方法进行分类,介绍代表性工作并分析其优缺点;之后,列举常用数据集以及评价指标,将后门攻击与对抗攻击、数据投毒2种相关安全威胁进行比较;最后,讨论文本领域后门攻击和防御面临的挑战,展望该新兴领域的未来研究方向. In the deep learning community,lots of efforts have been made to enhance the robustness and the reliability of deep neural networks(DNNs).Previous research mainly analyzed the fragility of DNN from the perspective of adversarial attack,and researchers designed numerous adversarial attack and defense methods.However,with the wide application of pre-trained models(PTMs),a new security threat against DNN especially PTM,called backdoor attack is emerging.Backdoor attack aims at injecting hidden backdoors into DNN,such that the backdoored model behaves properly on normal inputs but produces attacker-specified malicious outputs on the poisoned inputs embedded with special triggers.Backdoor attack poses a severe threat against DNN based systems like spam filter or hate speech detector.Compared with the textual adversarial attack and defense which has been widely studied,textual backdoor attack and defense has not been thoroughly investigated and requires a systematic review.In this paper,we present a comprehensive survey of backdoor attack and defense methods in the text domain.Specifically,we first summarize and categorize the textual backdoor attack and defense methods from different perspectives,then we introduce typical work and analyze their pros and cons.We also enumerate widely adopted benchmark datasets and evaluation metrics in the current literatures.Moreover,we respectively compare the backdoor attack with two relevant threats(i.e.,adversarial attack and data poisoning).Finally,we discuss existing challenges of backdoor attack and defense in the text domain and present several promising future directions in this emerging and rapidly growing research area.

作者郑明钰林政刘正宵付鹏王伟平 Zheng Mingyu;Lin Zheng;Liu Zhengxiao;Fu Peng;Wang Weiping(Institute of Information Engineering,Chinese Academy of Sciences,Beijing 100093;School of Cyber Security,University of Chinese Academy of Sciences,Beijing 100049)

机构地区中国科学院信息工程研究所中国科学院大学网络空间安全学院

出处《计算机研究与发展》 EI CSCD 北大核心 2024年第1期221-242,共22页 Journal of Computer Research and Development

基金国家自然科学基金项目(61976207,61906187)。

关键词后门攻击后门防御自然语言处理预训练模型 AI安全 backdoor attack backdoor defense natural language processing pre-trained models AI security

分类号 TP309.2 [自动化与计算机技术—计算机系统结构]

引文网络
相关文献

参考文献4

1Han Xu,Yao Ma,Hao-Chen Liu,Debayan Deb,Hui Liu,Ji-Liang Tang,Anil K.Jain.Adversarial Attacks and Defenses in Images, Graphs and Text: A Review[J].International Journal of Automation and computing,2020,17(2):151-178. 被引量：19
2陈大卫,付安民,周纯毅,陈珍珠.基于生成式对抗网络的联邦学习后门攻击方案[J].计算机研究与发展,2021,58(11):2364-2373. 被引量：11
3谭清尹,曾颖明,韩叶,刘一静,刘哲理.神经网络后门攻击研究[J].网络与信息安全学报,2021,7(3):46-58. 被引量：9
4杜巍,刘功申.深度学习中的后门攻击综述[J].信息安全学报,2022,7(3):1-16. 被引量：9

二级参考文献6

1刘全,翟建伟,章宗长,钟珊,周倩,章鹏,徐进.深度强化学习综述[J].计算机学报,2018,41(1):1-27. 被引量：446
2陈宇飞,沈超,王骞,李琦,王聪,纪守领,李康,管晓宏.人工智能系统安全与隐私风险[J].计算机研究与发展,2019,56(10):2135-2150. 被引量：48
3谭作文,张连福.机器学习隐私保护研究综述[J].软件学报,2020,31(7):2127-2156. 被引量：56
4陈佛计,朱枫,吴清潇,郝颖明,王恩德,崔芸阁.生成对抗网络及其在图像生成中的应用研究综述[J].计算机学报,2021,44(2):347-369. 被引量：74
5周纯毅,陈大卫,王尚,付安民,高艳松.分布式深度学习隐私与安全攻击研究进展与挑战[J].计算机研究与发展,2021,58(5):927-943. 被引量：17
6张颖君,陈恺,周赓,吕培卓,刘勇,黄亮.神经网络水印技术研究进展[J].计算机研究与发展,2021,58(5):964-976. 被引量：6

共引文献38

1翟正利,李鹏辉,冯舒.图对抗攻击研究综述[J].计算机工程与应用,2021,57(7):14-21. 被引量：2
2Andre Marasca,Andre Backes,Fabio Favarim,Marcelo Teixeira,Dalcimar Casanova.EDT Method for Multiple Labelled Objects Subject to Tied Distances[J].International Journal of Automation and computing,2021,18(3):468-479.
3吴翼腾,刘伟,于洪涛.图神经网络的标签翻转对抗攻击[J].通信学报,2021,42(9):65-74. 被引量：2
4黄静琪,贾西平,陈道鑫,柏柯嘉,廖秀秀.基于双对抗机制的图像攻击算法[J].计算机工程,2021,47(11):150-157.
5李鹏辉,翟正利,冯舒.图对抗防御研究进展[J].计算机科学与探索,2021,15(12):2292-2303. 被引量：2
6Chao Ma,Lexing Ying.ACHIEVING ADVERSARIAL ROBUSTNESS REQUIRES AN ACTIVE TEACHER[J].Journal of Computational Mathematics,2021,39(6):880-896.
7陈前昕,毕仁万,林劼,金彪,熊金波.支持多数不规则用户的隐私保护联邦学习框架[J].网络与信息安全学报,2022,8(1):139-150. 被引量：2
8姚鸿富,陈奋,陈荣有.针对机器学习Webshell检测模型的对抗补丁研究[J].信息网络安全,2021(S01):247-251.
9肖鹏,王柯强,黄振林.基于IABC和聚类优化RBF神经网络的电力信息网络安全态势评估[J].智慧电力,2022,50(6):100-106. 被引量：23
10陈明鑫,张钧波,李天瑞.联邦学习攻防研究综述[J].计算机科学,2022,49(7):310-323. 被引量：5

1张琳琛,王悦,董银果.国际农产品贸易网络的脆弱性研究[J].农业经济问题,2023(12):130-144. 被引量：2
2刘雪菊,王海菲.文化嵌入与关系网络:互市场域中的边商行动——基于黑龙江省黑河市的实地研究[J].内蒙古民族大学学报（社会科学版）,2023,49(5):110-117.
3王江,刘经纬,崔晓曦,范卫鹏.有限视场下的攻击时间和角度多约束制导律[J].北京理工大学学报,2024,44(1):18-27.

计算机研究与发展

2024年第1期

浏览历史

内容加载中请稍等...

文本后门攻击与防御综述

参考文献4

二级参考文献6

共引文献38

相关作者

相关机构

相关主题

浏览历史