期刊文献+

预训练模型辅助的后门样本自过滤防御方法

Self-filtering of Backdoor Samples by Aid of Pre-trained Model
下载PDF
导出
摘要 深度神经网络由于其出色的性能,被广泛地部署在各种环境下执行不同的任务,与此同时它的安全性变得越来越重要。近年来,后门攻击作为一种新型的攻击方式,对用户构成严重威胁。在训练阶段,攻击者对少量样本添加特定后门模式并标记为目标类以学习后门模型。后门模型可以以很高的概率将加入后门模式的测试样本识别为目标类,同时不影响正常样本的识别。用户通常无法掌握后门的先验信息,因此很难察觉后门攻击的存在。该文提出一种预训练模型辅助的后门样本自过滤方法,以防御后门攻击,包括目标类检测与后门样本自过滤两个部分。在第一部分,利用预训练模型提取样本特征,采用k近邻算法进行目标类检测;在第二部分,从非目标类样本中学习部分分类模型,之后多次执行“后门样本过滤”与“模型学习”的交替计算,在较好过滤后门样本的同时,也得到了完整的良性模型。 While deep neural networks(DNNs) have been widely deployed in various environments due to their excellent performances, serious security threats emerge accordingly. As a new type of attack in recent years, the backdoor attack composes one of the most serious threats which users are suffered from. The backdoor attack occurs when the attacker changes pixels in a minor amount of training images locally or globally using specific backdoor pattern called ‘trigger’,and also specifies the target label. Tested sample injected the same trigger will be classified into the target label with a high probability regardless of its ground truth, and the benign sample classification performance will not be impacted. Users usually have no prior knowledge about the backdoor attack, thereby the backdoor attack is not easy to be exposed. We propose a backdoor sample self-filtering by the aid of pre-trained model to defend against backdoor attack which contains two components: target class detection and backdoor samples’ self-filtering. At the first component, by using certain pre-trained model, feature representation is extracted for each sample, and then the k-nearest neighbor algorithm(kNN) is used to detect the target class. At the second component, a partial model is learned from the non-target class samples first, and then an iterative and alternative procedure of backdoor sample filtering and benign sample learning is conducted. Finally, not only backdoor samples are filtered out but a complete benign model is obtained as well.
作者 刘琦 张天行 陆小锋 吴汉舟 毛建华 孙广玲 LIU Qi;ZHANG Tian-xing;LU Xiao-feng;WU Han-zhou;MAO Jian-hua;SUN Guang-ling(School of Communication and Information Engineering,Shanghai University,Shanghai 200444,China)
出处 《计算机技术与发展》 2023年第1期121-129,共9页 Computer Technology and Development
基金 上海市科委科技创新行动计划项目(21511102605) 国家自然科学基金项目(61902235)。
关键词 深度神经网络 后门攻击 预训练模型 K近邻 自过滤 deep neural networks backdoor attack pre-trained model kNN self-filtering
  • 相关文献

参考文献5

二级参考文献61

  • 1Jeffrey Richter.Windows 95 Windows NT3.5高级编程技术[M].清华大学出版社,1996..
  • 2Lowe D G. Distinctive image features from scale-invariant keypoints[J]. International Journal of Computer Vision, 2004, 60 (2) 91 110.
  • 3Dalai N, Triggs B. Histograms of oriented gradients for human detection[C]//Computer Vision and Pattern Recognition (CVPR), IEEE Computer Society Conference on. San Diego, USA: IEEE, 2005, 1 886-893.
  • 4Hinton G E, Salakhutdinov R R. Reducing the dimensionality of data with neural networks[J]. Science, 2006, 313(5786) : 504-507.
  • 5Hubel D H, Wiesel T N. Receptive fields, binocular interaction and functional architecture in the catrs visual cortex[J]. The Journal of Physiology, 1962, 160(1): 106-154.
  • 6Fukushima K, Miyake S. Neocognitron: A new algorithm for pattern recognition tolerant of deformations and shifts in posi- tion[J]. Pattern Recognition, 1982, 15(6): 455-469.
  • 7Ruck D W, Rogers S K, Kabrisky M. Feature selection using a multilayer perceptron[J]. Journal of Neural Network Com- puting, 1990, 2(2): 40-48.
  • 8Rumelhart D E, Hinton G E, Williams R J. Learning representations by back-propagating errors[J]. Nature, 1986,3231 533 538.
  • 9LeCun Y, Denker J S, Henderson D, et al. Handwritten digit recognition with a back-propagation network[C]//Advances in Neural Information Processing Systems. Colorado, USA Is. n. ], 1990: 396-404.
  • 10LeCun Y, Cortes C. MNIST handwritten digit database[EB/OL], http//yann, lecun, com/exdb/mnist, 2010.

共引文献634

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部