期刊文献+

抑制图像非语义信息的通用后门防御策略

Non-semantic information suppression relevant backdoor defense implementation
原文传递
导出
摘要 目的后门攻击已成为目前卷积神经网络所面临的重要威胁。然而,当下的后门防御方法往往需要后门攻击和神经网络模型的一些先验知识,这限制了这些防御方法的应用场景。本文依托图像分类任务提出一种基于非语义信息抑制的后门防御方法,该方法不再需要相关的先验知识,只需要对网络的输入进行编解码处理就可以达到后门防御的目的。方法核心思想是在保持图像语义不改变的同时,尽量削弱原始样本中与图像语义不相关的信息,以此抑制触发器。通过在待保护模型前添加一个即插即用的U型网络(即信息提纯网络)来实现对图像非语义信息的抑制。其输入是干净的初始样本,输出命名为强化样本。具体的训练过程中,首先用不同的训练超参数训练多个结构不一的干净分类器,然后在保持强化样本被上述分类器正确分类的前提下,优化信息提纯网络使强化样本和原始样本之间的差异尽可能地大。结果实验在MNIST、CIFAR10和Image Net10数据集上进行。实验结果显示,经过信息提纯网络编解码后,干净样本的分类准确率略有下降,后门攻击成功率大幅降低,带有触发器的样本以接近干净样本的准确率被正确预测。结论提出的非语义信息抑制防御方法能够在不需要相关先验知识的情况下将含触发器的样本纠正为正常样本,并且保持对干净样本的分类准确率。 Objective The emerging convolutional neural networks(CNNs)have shown its potentials in the context of computer science,electronic information,mathematics,and finance.However,the security issue is challenged for multiple domains.It is capable to use the neural network model to predict the samples with triggers as target labels in the inference stage through adding the samples with triggers to the data set and changing the labels of samples to target labels in the training process of supervised learning.Backdoor attacks have threaten the interests of model owners severely,especially in high value-added areas like financial security.To preserve backdoor attacks-derived neural network model,a series of defense strategies are implemented.However,conventional defense methods are often required for the prior knowledge of backdoor attack methods or neural network models in relevant to the type and size of the trigger,which is inconsistent and limits the application scenarios of defense methods.To resolve this problem,we develop a backdoor defense method based on input-modified image classification task,called information purification network(IPN).The process of the IPNcan eliminates the impact of the trigger-added samples.Method To alleviate a large amount of redundant information in image samples,we segment the image information into two categories:1)classification task-oriented semantic information,and 2)classification task-inrelevant non-semantic information.To get the sample being predicted as the target label for interpretation,backdoor attack can enforce the model to pay attention to the non-semantic information of the sample during the model training process.To suppress the noise of trigger,our IPN is demonstrated as a CNN used for encoding and decoding the input samples,which aims to keep the image semantics unchanged via minimizing the non-semantic information in the original samples.The inputs to the IPN are as the clean samples,as well as the outputs are as the modified samples.For specific training,first,several clean classifiers are trained on the basis of multiple structures and training hyperparameters.Then,the IPN is optimized to make the difference between the modified sample and the original sample as large as possible on the premise of keeping the modified sample correctly predicted by the above classifier.The loss function consists of two aspects as mentioned below:1)semantic information retention,and 2)non-semantic information suppression.To alleviate the difference between the sample and the original sample,the weight of the two parts of the loss function can be balanced.The process of IPN-related sample decoding can disrupt the structure of the trigger.Therefore,the sample will not be predicted as the target label even if the model is injected backdoor.In addition,due to the semantic information in the samples image is not weakened,trigger-involved samples can be used to predict the correct labels whether the model is injected into the backdoor or not.Result All experiments are performed on NVIDIA GeForce RTX 3090 graphics card.The execution environment is Python 3.8.5 with Pytorch version 1.9.1.The datasets are tested in relevant to CIFAR10,MNIST,and Image-Net10.The ImageNet10 dataset is constructed in terms of selecting 10 categories from the ImageNet dataset in random,which are composed of 12831 images in total.We randomly selected 10264 images as the training dataset,and the remaining 2567 images as the test dataset.The architecture of the IPN is U-Net.To evaluate the defense performance of the proposed strategy in detail,a variety of different triggers are used to implement backdoor attacks.For MNIST datasets,the classification accuracy of the clean model for the initial clean sample is 99%.We use two different triggers to implement backdoor attacks as well.Each average classification accuracy of clean samples is 99%,and the success rates of backdoor attacks are 100%.After all samples are encoded and decoded by the IPN,the classification accuracy of clean samples is remained in consistent,while the success rate of backdoor attacks dropped to 10%,and the backdoor samples are predicted to be correctly labeled 98%as well.The experimental results are similar to MNIST for the other two datasets.While the classification accuracy of clean samples decreases slightly,the success rate of backdoor attacks is optimized about 10%,and the backdoor samples are correctly predicted with high accuracy.It should be mentioned that the intensity and size of the triggers can impact the defensive performance of the proposed strategy to a certain extent.The weight between the two parts of the loss function will affect the accuracy of clean samples.The weight of non-semantic information suppression loss is positive correlated to the difference of images and negative correlated to the classification accuracy of clean samples.Conclusion Our proposed strategy is not required any prior knowledge for triggers and the models to be protected.The classification accuracy of clean samples can keep unchanged,and the success rate of backdoor attack is equivalent to random guess,and the backdoor samples will be predicted as correct labels by classifiers,regardless of the problem of classifiers are injected into the backdoor.The training of the IPN is required on clean training data and the task of the protected model only.In the implementation of defense,the IPN can just be configured to predominate the protected model for input sample preprocessing.Multiple backdoor attacks are simulated on the three mentioned data sets.Experimental results show that our defense strategy is an optimized implementation for heterogeneity.
作者 郭钰生 钱振兴 张新鹏 柴洪峰 Guo Yusheng;Qian Zhenxing;Zhang Xinpeng;Chai Hongfeng(School of Computer Science,Fudan University,Shanghai 200438,China;Key Laboratory of Digital Culture Protection and Tourism Data Intelligent Computing,Ministry of Culture and Tourism,Shanghai 200438,China;Fintech Research Institute,Fudan University,Shanghai 200438,China)
出处 《中国图象图形学报》 CSCD 北大核心 2023年第3期836-849,共14页 Journal of Image and Graphics
基金 国家自然科学基金项目(U20B2051,U1936214)。
关键词 卷积神经网络(CNN) 模型安全 图像分类 神经网络后门 后门防御 convolutional neural network(CNN) model security image classification neural network backdoor backdoor defense
  • 相关文献

参考文献4

二级参考文献12

共引文献10

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部