摘要
深度学习模型在图像分类等领域取得了较好的结果,但是深度学习模型容易受到对抗样本的干扰威胁,攻击者通过对抗样本制作算法,精心设计微小扰动,构造肉眼难以分辨却能引发模型误分类的对抗样本,给图像分类等深度学习应用带来严重的安全隐患。为提升图像分类模型的鲁棒性,利用条件扩散模型,提出一种综合对抗样本检测和对抗样本净化的对抗样本防御方法。在不修改目标模型的基础上,检测并净化对抗样本,提升目标模型鲁棒性。所提方法包括对抗样本检测和对抗样本净化2个模块。对于对抗样本检测,采用不一致性增强,通过训练一个融入目标模型高维特征和图片基本特征的图像修复模型,比较初始输入和修复结果的不一致性,检测对抗样本;对于对抗样本净化,采用端到端的对抗样本净化方式,在去噪模型执行过程中加入图片伪影,实现对抗样本净化。在保证目标模型精度的前提下,在目标模型前增加对抗样本检测和净化模块,根据检测结果,选取相应的净化策略,从而消除对抗样本,提升目标模型的鲁棒性。在CIFAR10数据集和CIFAR100数据集上与5种现有方法进行对比实验,实验结果表明:对于扰动较小的对抗样本,所提方法的检测精度较Argos方法提升了5~9个百分点;相比于ADP方法,所提方法在面对不同种类对抗样本时防御效果更稳定,且在BPDA攻击下,其对抗样本净化效果较ADP方法提升了1.3个百分点。
Deep-learning models have achieved impressive results in fields such as image classification;however,they remain vulnerable to interference and threats from adversarial examples.Attackers can craft small perturbations using various attack algorithms to create adversarial examples that are visually indistinguishable yet can lead to misclassification in deep neural networks,posing significant security risks to image classification tasks.To improve the robustness of these models,we propose an adversarial-example defense method that combines adversarial detection and purification using a conditional diffusion model,while preserving the structure and parameters of the target model during detection and purification.This approach features two key modules:adversarial detection and adversarial purification.For adversarial detection,we employ an inconsistency enhancement technique,training an image restoration model that integrates both the high-dimensional features of the target model and basic image features.By comparing the inconsistencies between the initial input and the restored output,adversarial examples can be detected.An end-to-end adversarial purification method is then applied,introducing image artifacts during the denoising process.An adversarial detection and purification module is placed before the target model to ensure its accuracy.Based on detection outcomes,appropriate purification strategies are implemented to remove adversarial examples and improve model robustness.The method was compared with recent adversarial detection and purification approaches on the CIFAR10 and CIFAR100 datasets,using five adversarial attack algorithms to generate adversarial examples.It demonstrated a 5-9 percentage points improvement in detection accuracy over Argos on both datasets in a low-purification setting.Additionally,it exhibited a more stable defense performance than Adaptive Denoising Purification(ADP),with a 1.3 percentage points higher accuracy under Backwards Pass Differentiable Approximation(BPDA)attacks.
作者
陈子民
关志涛
CHEN Zimin;GUAN Zhitao(School of Control and Computer Engineering,North China Electric Power University,Beijing 102206,China)
出处
《计算机工程》
CAS
CSCD
北大核心
2024年第12期296-305,共10页
Computer Engineering
基金
国家自然科学基金(62372173)。
关键词
对抗防御
对抗样本检测
对抗样本净化
扩散模型
图像去噪
adversarial defense
adversarial example detection
adversarial purification
diffusion model
image denoising