抑制图像非语义信息的通用后门防御策略

Non-semantic information suppression relevant backdoor defense implementation

导出

摘要目的后门攻击已成为目前卷积神经网络所面临的重要威胁。然而,当下的后门防御方法往往需要后门攻击和神经网络模型的一些先验知识,这限制了这些防御方法的应用场景。本文依托图像分类任务提出一种基于非语义信息抑制的后门防御方法,该方法不再需要相关的先验知识,只需要对网络的输入进行编解码处理就可以达到后门防御的目的。方法核心思想是在保持图像语义不改变的同时,尽量削弱原始样本中与图像语义不相关的信息,以此抑制触发器。通过在待保护模型前添加一个即插即用的U型网络(即信息提纯网络)来实现对图像非语义信息的抑制。其输入是干净的初始样本,输出命名为强化样本。具体的训练过程中,首先用不同的训练超参数训练多个结构不一的干净分类器,然后在保持强化样本被上述分类器正确分类的前提下,优化信息提纯网络使强化样本和原始样本之间的差异尽可能地大。结果实验在MNIST、CIFAR10和Image Net10数据集上进行。实验结果显示,经过信息提纯网络编解码后,干净样本的分类准确率略有下降,后门攻击成功率大幅降低,带有触发器的样本以接近干净样本的准确率被正确预测。结论提出的非语义信息抑制防御方法能够在不需要相关先验知识的情况下将含触发器的样本纠正为正常样本,并且保持对干净样本的分类准确率。 Objective The emerging convolutional neural networks(CNNs)have shown its potentials in the context of computer science,electronic information,mathematics,and finance.However,the security issue is challenged for multiple domains.It is capable to use the neural network model to predict the samples with triggers as target labels in the inference stage through adding the samples with triggers to the data set and changing the labels of samples to target labels in the training process of supervised learning.Backdoor attacks have threaten the interests of model owners severely,especially in high value-added areas like financial security.To preserve backdoor attacks-derived neural network model,a series of defense strategies are implemented.However,conventional defense methods are often required for the prior knowledge of backdoor attack methods or neural network models in relevant to the type and size of the trigger,which is inconsistent and limits the application scenarios of defense methods.To resolve this problem,we develop a backdoor defense method based on input-modified image classification task,called information purification network(IPN).The process of the IPNcan eliminates the impact of the trigger-added samples.Method To alleviate a large amount of redundant information in image samples,we segment the image information into two categories:1)classification task-oriented semantic information,and 2)classification task-inrelevant non-semantic information.To get the sample being predicted as the target label for interpretation,backdoor attack can enforce the model to pay attention to the non-semantic information of the sample during the model training process.To suppress the noise of trigger,our IPN is demonstrated as a CNN used for encoding and decoding the input samples,which aims to keep the image semantics unchanged via minimizing the non-semantic information in the original samples.The inputs to the IPN are as the clean samples,as well as the outputs are as the modified samples.For specific training,first,several clean classifiers are trained on the basis of multiple structures and training hyperparameters.Then,the IPN is optimized to make the difference between the modified sample and the original sample as large as possible on the premise of keeping the modified sample correctly predicted by the above classifier.The loss function consists of two aspects as mentioned below:1)semantic information retention,and 2)non-semantic information suppression.To alleviate the difference between the sample and the original sample,the weight of the two parts of the loss function can be balanced.The process of IPN-related sample decoding can disrupt the structure of the trigger.Therefore,the sample will not be predicted as the target label even if the model is injected backdoor.In addition,due to the semantic information in the samples image is not weakened,trigger-involved samples can be used to predict the correct labels whether the model is injected into the backdoor or not.Result All experiments are performed on NVIDIA GeForce RTX 3090 graphics card.The execution environment is Python 3.8.5 with Pytorch version 1.9.1.The datasets are tested in relevant to CIFAR10,MNIST,and Image-Net10.The ImageNet10 dataset is constructed in terms of selecting 10 categories from the ImageNet dataset in random,which are composed of 12831 images in total.We randomly selected 10264 images as the training dataset,and the remaining 2567 images as the test dataset.The architecture of the IPN is U-Net.To evaluate the defense performance of the proposed strategy in detail,a variety of different triggers are used to implement backdoor attacks.For MNIST datasets,the classification accuracy of the clean model for the initial clean sample is 99%.We use two different triggers to implement backdoor attacks as well.Each average classification accuracy of clean samples is 99%,and the success rates of backdoor attacks are 100%.After all samples are encoded and decoded by the IPN,the classification accuracy of clean samples is remained in consistent,while the success rate of backdoor attacks dropped to 10%,and the backdoor samples are predicted to be correctly labeled 98%as well.The experimental results are similar to MNIST for the other two datasets.While the classification accuracy of clean samples decreases slightly,the success rate of backdoor attacks is optimized about 10%,and the backdoor samples are correctly predicted with high accuracy.It should be mentioned that the intensity and size of the triggers can impact the defensive performance of the proposed strategy to a certain extent.The weight between the two parts of the loss function will affect the accuracy of clean samples.The weight of non-semantic information suppression loss is positive correlated to the difference of images and negative correlated to the classification accuracy of clean samples.Conclusion Our proposed strategy is not required any prior knowledge for triggers and the models to be protected.The classification accuracy of clean samples can keep unchanged,and the success rate of backdoor attack is equivalent to random guess,and the backdoor samples will be predicted as correct labels by classifiers,regardless of the problem of classifiers are injected into the backdoor.The training of the IPN is required on clean training data and the task of the protected model only.In the implementation of defense,the IPN can just be configured to predominate the protected model for input sample preprocessing.Multiple backdoor attacks are simulated on the three mentioned data sets.Experimental results show that our defense strategy is an optimized implementation for heterogeneity.

作者郭钰生钱振兴张新鹏柴洪峰 Guo Yusheng;Qian Zhenxing;Zhang Xinpeng;Chai Hongfeng(School of Computer Science,Fudan University,Shanghai 200438,China;Key Laboratory of Digital Culture Protection and Tourism Data Intelligent Computing,Ministry of Culture and Tourism,Shanghai 200438,China;Fintech Research Institute,Fudan University,Shanghai 200438,China)

机构地区复旦大学计算机科学技术学院文化和旅游部数字文化保护与旅游数据智能计算重点实验室复旦大学金融科技研究院

出处《中国图象图形学报》 CSCD 北大核心 2023年第3期836-849,共14页 Journal of Image and Graphics

基金国家自然科学基金项目(U20B2051,U1936214)。

关键词卷积神经网络(CNN) 模型安全图像分类神经网络后门后门防御 convolutional neural network(CNN) model security image classification neural network backdoor backdoor defense

分类号 TP183 [自动化与计算机技术—控制理论与控制工程] TP389.1 [自动化与计算机技术—计算机系统结构]

引文网络
相关文献

参考文献4

1尹晓琳,卢伟,张俊鸿,罗向阳.无损载体和鲁棒代价结合的JPEG图像鲁棒隐写[J].中国图象图形学报,2022,27(1):238-251. 被引量：3
2王杨,曹铁勇,杨吉斌,郑云飞,方正,邓小桐.结合扰动约束的低感知性对抗样本生成方法[J].中国图象图形学报,2022,27(7):2287-2299. 被引量：3
3孙杉,张卫明,方涵,俞能海.中文水印字库的自动生成方法[J].中国图象图形学报,2022,27(1):262-276. 被引量：6
4刘复昌,南博,缪永伟.基于显著性图的点云替换对抗攻击[J].中国图象图形学报,2022,27(2):500-510. 被引量：3

二级参考文献12

1项圣凯,曹铁勇,方正,洪施展.使用密集弱注意力机制的图像显著性检测[J].中国图象图形学报,2020,0(1):136-147. 被引量：5
2张宇,刘挺,陈毅恒,赵世奇,李生.自然语言文本水印[J].中文信息学报,2005,19(1):56-62. 被引量：51
3刘东,孙明,周明天.基于图论的文本数字水印技术[J].计算机研究与发展,2007,44(10):1757-1764. 被引量：6
4亓文法,李晓龙,杨斌,程道放.用于信息追踪的文本水印算法[J].通信学报,2008,29(10):183-190. 被引量：22
5邓小鸿,陈志刚,毛伊敏.基于无损水印的医学图像篡改检测和高质量恢复[J].中国图象图形学报,2014,19(4):583-591. 被引量：17
6李淑芝,张翔,邓小鸿,吴晓燕.基于模式特征的H.264/AVC可逆视频水印[J].中国图象图形学报,2015,20(10):1285-1296. 被引量：6
7黄颖,康明红.使用四叉树分割进行自适应空域隐写[J].中国图象图形学报,2018,23(5):629-639. 被引量：3
8汪然,薛小燕,平西建,牛少彰,张涛.分类与分割相结合的JPEG图像隐写分析[J].中国图象图形学报,2018,23(10):1472-1482. 被引量：4
9张新良,付陈琳,赵运基.扩展点态卷积网络的点云分类分割模型[J].中国图象图形学报,2020,25(8):1551-1557. 被引量：2
10杜静,蔡国榕.多特征融合与残差优化的点云语义分割方法[J].中国图象图形学报,2021,26(5):1105-1116. 被引量：9

共引文献10

1王晨,吴国华,姚晔,任一支,王秋华,袁理锋.深度学习汉字生成与字体风格迁移综述[J].中国图象图形学报,2022,27(12):3415-3428. 被引量：6
2王可,吴绍武,尹晓琳,付婧巧,陈兵,卢伟.全体变长编码映射的JPEG可逆信息隐藏[J].中国图象图形学报,2023,28(3):734-748. 被引量：1
3王晨,姚晔,李黎.抗打印扫描和屏幕拍摄的字形扰动研究进展[J].应用科学学报,2023,41(2):240-251.
4吴汉舟,张杰,李越,殷赵霞,张新鹏,田晖,李斌,张卫明,俞能海.人工智能模型水印研究进展[J].中国图象图形学报,2023,28(6):1792-1810. 被引量：8
5姚晔,刘书辉,王慧,李琛璞,李黎.基于字符扰动变形和字库替换的鲁棒中文文本水印[J].密码学报,2023,10(4):769-785.
6乔梁,李再升,程战战,李玺.SCID:用于富含视觉信息文档图像中信息提取任务的扫描中文票据数据集[J].中国图象图形学报,2023,28(8):2298-2313. 被引量：2
7瞿左珉,殷琪林,盛紫琦,吴俊彦,张博林,余尚戎,卢伟.人脸深度伪造主动防御技术综述[J].中国图象图形学报,2024,29(2):318-342.
8马宾,李坤,徐健,王春鹏,李健,张立伟.联合多重对抗与通道注意力的高安全性图像隐写[J].中国图象图形学报,2024,29(2):355-368.
9刘伟权,郑世均,郭宇,王程.三维点云目标识别对抗攻击研究综述[J].电子与信息学报,2024,46(5):1645-1657.
10叶乙轩,杜侠,陈思,朱顺痣,严严.二维码掩膜下的稀疏对抗补丁攻击[J].中国图象图形学报,2024,29(7):1889-1901.

1杨研蝶,李志刚,张思成,包志达,林云.降噪自编码器用于频谱感知对抗防御模型[J].移动通信,2023,47(2):28-36.
2刘叶,叶高含,刘旦.基于蒙特卡洛梯度估计的黑盒神经网络后门检测[J].武汉大学学报（理学版）,2023,69(1):8-18.
3朱淑雯,罗戈,韦平,李晟,张新鹏,钱振兴.隐蔽图像后门攻击[J].中国图象图形学报,2023,28(3):864-877. 被引量：1
4史腾飞,尚家秀,吴宗航.基于卷积神经网络的图像分类改进方法研究[J].现代信息科技,2023,7(5):109-112. 被引量：8
5闫攀,周莉,闫会峰.混合云存储下物联网隐私数据保护模型研究[J].计算机仿真,2023,40(2):530-534. 被引量：3
6刘坤,曾恩,刘博涵,李俊达,李江荣.基于多变量时序数据的对抗攻击与防御方法[J].北京工业大学学报,2023,49(4):415-423.
7汤家军,王忠.基于FGSM的对抗样本生成算法[J].计算机技术与发展,2023,33(3):105-109. 被引量：1
8江沸菠,彭于波,董莉.面向6G的深度图像语义通信模型[J].通信学报,2023,44(3):198-208. 被引量：3
9Haijian Shao,Edwin Ma,Ming Zhu,Xing Deng,Shengjie Zhai.MNIST Handwritten Digit Classification Based on Convolutional Neural Network with Hyperparameter Optimization[J].Intelligent Automation & Soft Computing,2023(6):3595-3606.
10杨帆,李邵梅,金柯君.基于改进人工蜂群算法的文本对抗样本生成[J].计算机系统应用,2022,31(11):238-245. 被引量：3

中国图象图形学报

2023年第3期

浏览历史

内容加载中请稍等...

抑制图像非语义信息的通用后门防御策略

参考文献4

二级参考文献12

共引文献10

相关作者

相关机构

相关主题

浏览历史