摘要
针对深度神经网络(DNN)中的可解释性导致模型信息泄露的问题,证明了在白盒环境下利用Grad-CAM解释方法产生对抗样本的可行性,并提出一种无目标的黑盒攻击算法--动态遗传算法。该算法首先根据解释区域与扰动像素位置的变化关系改进适应度函数,然后通过多轮的遗传算法在不断减少扰动值的同时递增扰动像素的数量,而且每一轮的结果坐标集会在下一轮的迭代中保留使用,直到在未超过扰动边界的情况下扰动像素集合使预测标签发生翻转。在实验部分,所提算法在AlexNet、VGG-19、ResNet-50和SqueezeNet模型下的攻击成功率平均为92.88%,与One pixel算法相比,虽然增加了8%的运行时间,但成功率提高了16.53个百分点。此外,该算法能够在更短的运行时间内,使成功率高于Ada-FGSM算法3.18个百分点,高于PPBA算法8.63个百分点,并且与Boundary-attack算法的成功率相差不大。结果表明基于解释方法的动态遗传算法能有效进行对抗攻击。
Aiming at the problem of model information leakage caused by interpretability in Deep Neural Network(DNN),the feasibility of using the Gradient-weighted Class Activation Mapping(Grad-CAM)interpretation method to generate adversarial samples in a white-box environment was proved,moreover,an untargeted black-box attack algorithm named dynamic genetic algorithm was proposed.In the algorithm,first,the fitness function was improved according to the changing relationship between the interpretation area and the positions of the disturbed pixels.Then,through multiple rounds of genetic algorithm,the disturbance value was continuously reduced while increasing the number of the disturbed pixels,and the set of result coordinates of each round would be maintained and used in the next round of iteration until the perturbed pixel set caused the predicted label to be flipped without exceeding the perturbation boundary.In the experiment part,the average attack success rate under the AlexNet,VGG-19,ResNet-50 and SqueezeNet models of the proposed algorithm was 92.88%,which was increased by 16.53 percentage points compared with that of One pixel algorithm,although with the running time increased by 8%compared with that of One pixel algorithm.In addition,in a shorter running time,the proposed algorithm had the success rate higher than the Adaptive Fast Gradient Sign Method(Ada-FGSM)algorithm by 3.18 percentage points,higher than the Projection&Probability-driven Black-box Attack(PPBA)algorithm by 8.63 percentage points,and not much different from Boundary-attack algorithm.The results show that the dynamic genetic algorithm based on the interpretation method can effectively execute the adversarial attack.
作者
陈权
李莉
陈永乐
段跃兴
CHEN Quan;LI Li;CHEN Yongle;DUAN Yuexing(College of Information and Computer,Taiyuan University of Technology,Jinzhong Shanxi 030600,China)
出处
《计算机应用》
CSCD
北大核心
2022年第2期510-518,共9页
journal of Computer Applications
基金
山西省重点研发计划项目(201903D121121)。
关键词
深度神经网络
解释方法
显著图
对抗攻击
遗传算法
Deep Neural Network(DNN)
interpretation method
saliency map
adversarial attack
genetic algorithm