摘要
对抗样本的发现与研究证实了深度神经网络的脆弱性.如果不对对抗样本的生成加以约束,那么触手可及的图像将不再安全并随时可能对不鲁棒的深度神经网络构成威胁.然而,现有的对抗防御主要旨在防止对抗样本成功攻击深度神经网络,而不是防止对抗样本的生成.因此,本文提出了一种新颖的对抗防御机制,该机制被称为免疫防御.免疫防御通过主动地在原始图像上添加难以察觉的扰动使得攻击者无法针对该图像制作出有效的对抗样本,从而同时保护了图像和深度神经网络.这种良性的扰动被称为免疫扰动,添加了免疫扰动的图像被称为免疫样本.在白盒免疫防御中,本文提出了双曲正切免疫防御(Hyperbolic Tangent Immune Defense,HTID)以制作高分类准确率、高防御性能和高视觉质量的白盒免疫样本;在黑盒免疫防御中,提出了基于矩的免疫防御(Moment-based Immune Defense,MID)以提升免疫样本的可迁移性,从而确保免疫样本对未知对抗攻击的防御性能.此外,本文还提出了免疫率以更加准确地衡量免疫样本的防御性能.在CIFAR-10、MNIST、STL-10和Caltech-256数据集上的大量实验表明,HTID和MID制作的免疫样本具有高分类准确率,在Inception-v3、ResNet-50、LeNet-5和Model C上的准确率均达到了100.0%,比原始准确率平均高出10.5%.制作的免疫样本同时具有高视觉质量,其SSIM最低为0.822,最高为0.900.实验也表明MID有着比HTID更高的可迁移性,MID在四个数据集上针对AdvGAN制作的免疫样本防御其他11种对抗攻击的平均免疫率分别为62.1%、52.1%、56.8%和48.7%,这比HTID高出15.0%、10.8%、17.5%和15.7%.
The vulnerability of deep neural networks to adversarial examples has been confirmed.If the generation of adversarial examples is unregulated,images within reach are no longer secure and pose a threat to non-robust DNNs.However,existing adversarial defenses primarily aim at preventing adversarial examples from attacking deep neural networks successfully,rather than preventing their generation.Therefore,we propose a novel adversarial defense mechanism,which is referred to as immune defense.This mechanism applies carefully designed quasi�imperceptible perturbations to the raw images to prevent the generation of adversarial examples for the raw images thereby protecting both images and deep neural networks.Such perturbations are referred to as immune perturbations,and these perturbed images are referred to as immune examples.In the white-box immune defense,we propose Hyperbolic Tangent Immune Defense(HTID)to craft white-box immune examples with high classification accuracy,defensive performance,and visual quality.In the black-box immune defense,we propose Moment-based Immune Defense(MID)to enhance the transferability of immune examples,so as to ensure the defensive performance against unknown adversarial attacks.In addition,we propose immune rate to more accurately measure the defensive performance of immune examples.Extensive experiments on CIFAR-10,MNIST,STL-10,and Caltech-256 show that the immune examples crafted by HTID and MID have high classification accuracy,which reaches 100.0%and is 10.5%higher than the original accuracy on average.The immune examples also have high visual quality with SSIM between 0.822 and 0.900.The experiments also show that MID has higher transferability than HTID.The average immune rates of the immune examples crafted by MID against AdvGAN to defend against other 11 adversarial attacks on the two datasets are 62.1%,52.1%,56.8%and 48.7%,which are 15.0%,10.8%,17.5%and 15.7%higher than HTID,respectively.
作者
吴昊
王金伟
罗向阳
马宾
WU Hao;WANG Jin-Wei;LUO Xiang-Yang;MA Bin(School of Computer Science,Nanjing University of Information Science and Technology,Nanjing 210044;Engineering Research Center of Digital Forensics Ministry of Education,Nanjing University of Information Science and Technology,Nanjing 210044;School of Cyber Security,PLA Strategic Support Force Information Engineering University,Zhengzhou 450001;School of Cyber Security,Qilu University of Technology,Jinan 250353)
出处
《计算机学报》
EI
CAS
CSCD
北大核心
2024年第8期1786-1812,共27页
Chinese Journal of Computers
基金
国家自然科学基金(No.62072250,62172435,U1804263,U20B2065,61872203,71802110,61802212)
中国中原科技创新领军人才项目基金(No.214200510019)
河南省网络空间态势感知重点实验室开放课题基金(No.HNTS2022002)资助。
关键词
深度神经网络
对抗样本
对抗防御
免疫防御
可迁移性
deep neural network
adversarial example
adversarial defense
immune defense
transferability