Federated learning is widely used to solve the problem of data decentralization and can provide privacy protectionfor data owners. However, since multiple participants are required in federated learning, this allows a...Federated learning is widely used to solve the problem of data decentralization and can provide privacy protectionfor data owners. However, since multiple participants are required in federated learning, this allows attackers tocompromise. Byzantine attacks pose great threats to federated learning. Byzantine attackers upload maliciouslycreated local models to the server to affect the prediction performance and training speed of the global model. Todefend against Byzantine attacks, we propose a Byzantine robust federated learning scheme based on backdoortriggers. In our scheme, backdoor triggers are embedded into benign data samples, and then malicious localmodels can be identified by the server according to its validation dataset. Furthermore, we calculate the adjustmentfactors of local models according to the parameters of their final layers, which are used to defend against datapoisoning-based Byzantine attacks. To further enhance the robustness of our scheme, each localmodel is weightedand aggregated according to the number of times it is identified as malicious. Relevant experimental data showthat our scheme is effective against Byzantine attacks in both independent identically distributed (IID) and nonindependentidentically distributed (non-IID) scenarios.展开更多
Deep Neural Networks(DNNs)are integral to various aspects of modern life,enhancing work efficiency.Nonethe-less,their susceptibility to diverse attack methods,including backdoor attacks,raises security concerns.We aim...Deep Neural Networks(DNNs)are integral to various aspects of modern life,enhancing work efficiency.Nonethe-less,their susceptibility to diverse attack methods,including backdoor attacks,raises security concerns.We aim to investigate backdoor attack methods for image categorization tasks,to promote the development of DNN towards higher security.Research on backdoor attacks currently faces significant challenges due to the distinct and abnormal data patterns of malicious samples,and the meticulous data screening by developers,hindering practical attack implementation.To overcome these challenges,this study proposes a Gaussian Noise-Targeted Universal Adversarial Perturbation(GN-TUAP)algorithm.This approach restricts the direction of perturbations and normalizes abnormal pixel values,ensuring that perturbations progress as much as possible in a direction perpendicular to the decision hyperplane in linear problems.This limits anomalies within the perturbations improves their visual stealthiness,and makes them more challenging for defense methods to detect.To verify the effectiveness,stealthiness,and robustness of GN-TUAP,we proposed a comprehensive threat model.Based on this model,extensive experiments were conducted using the CIFAR-10,CIFAR-100,GTSRB,and MNIST datasets,comparing our method with existing state-of-the-art attack methods.We also tested our perturbation triggers using various defense methods and further experimented on the robustness of the triggers against noise filtering techniques.The experimental outcomes demonstrate that backdoor attacks leveraging perturbations generated via our algorithm exhibit cross-model attack effectiveness and superior stealthiness.Furthermore,they possess robust anti-detection capabilities and maintain commendable performance when subjected to noise-filtering methods.展开更多
In recent years,the number of parameters of deep neural networks(DNNs)has been increasing rapidly.The training of DNNs is typically computation-intensive.As a result,many users leverage cloud computing and outsource t...In recent years,the number of parameters of deep neural networks(DNNs)has been increasing rapidly.The training of DNNs is typically computation-intensive.As a result,many users leverage cloud computing and outsource their training procedures.Outsourcing computation results in a potential risk called backdoor attack,in which a welltrained DNN would performabnormally on inputs with a certain trigger.Backdoor attacks can also be classified as attacks that exploit fake images.However,most backdoor attacks design a uniformtrigger for all images,which can be easilydetectedand removed.In this paper,we propose a novel adaptivebackdoor attack.We overcome this defect and design a generator to assign a unique trigger for each image depending on its texture.To achieve this goal,we use a texture complexitymetric to create a specialmask for eachimage,which forces the trigger tobe embedded into the rich texture regions.The trigger is distributed in texture regions,which makes it invisible to humans.Besides the stealthiness of triggers,we limit the range of modification of backdoor models to evade detection.Experiments show that our method is efficient in multiple datasets,and traditional detectors cannot reveal the existence of a backdoor.展开更多
Backdoor attacks are emerging security threats to deep neural networks.In these attacks,adversaries manipulate the network by constructing training samples embedded with backdoor triggers.The backdoored model performs...Backdoor attacks are emerging security threats to deep neural networks.In these attacks,adversaries manipulate the network by constructing training samples embedded with backdoor triggers.The backdoored model performs as expected on clean test samples but consistently misclassifies samples containing the backdoor trigger as a specific target label.While quantum neural networks(QNNs)have shown promise in surpassing their classical counterparts in certain machine learning tasks,they are also susceptible to backdoor attacks.However,current attacks on QNNs are constrained by the adversary's understanding of the model structure and specific encoding methods.Given the diversity of encoding methods and model structures in QNNs,the effectiveness of such backdoor attacks remains uncertain.In this paper,we propose an algorithm that leverages dataset-based optimization to initiate backdoor attacks.A malicious adversary can embed backdoor triggers into a QNN model by poisoning only a small portion of the data.The victim QNN maintains high accuracy on clean test samples without the trigger but outputs the target label set by the adversary when predicting samples with the trigger.Furthermore,our proposed attack cannot be easily resisted by existing backdoor detection methods.展开更多
The Deep Neural Networks(DNN)training process is widely affected by backdoor attacks.The backdoor attack is excellent at concealing its identity in the DNN by performing well on regular samples and displaying maliciou...The Deep Neural Networks(DNN)training process is widely affected by backdoor attacks.The backdoor attack is excellent at concealing its identity in the DNN by performing well on regular samples and displaying malicious behavior with data poisoning triggers.The state-of-art backdoor attacks mainly follow a certain assumption that the trigger is sample-agnostic and different poisoned samples use the same trigger.To overcome this problem,in this work we are creating a backdoor attack to check their strength to withstand complex defense strategies,and in order to achieve this objective,we are developing an improved Convolutional Neural Network(ICNN)model optimized using a Gradient-based Optimization(GBO)(ICNN-GBO)algorithm.In the ICNN-GBO model,we are injecting the triggers via a steganography and regularization technique.We are generating triggers using a single-pixel,irregular shape,and different sizes.The performance of the proposed methodology is evaluated using different performance metrics such as Attack success rate,stealthiness,pollution index,anomaly index,entropy index,and functionality.When the CNN-GBO model is trained with the poisoned dataset,it will map the malicious code to the target label.The proposed scheme’s effectiveness is verified by the experiments conducted on both the benchmark datasets namely CIDAR-10 andMSCELEB 1M dataset.The results demonstrate that the proposed methodology offers significant defense against the conventional backdoor attack detection frameworks such as STRIP and Neutral cleanse.展开更多
Deep learning models are well known to be susceptible to backdoor attack,where the attacker only needs to provide a tampered dataset on which the triggers are injected.Models trained on the dataset will passively impl...Deep learning models are well known to be susceptible to backdoor attack,where the attacker only needs to provide a tampered dataset on which the triggers are injected.Models trained on the dataset will passively implant the backdoor,and triggers on the input can mislead the models during testing.Our study shows that the model shows different learning behaviors in clean and poisoned subsets during training.Based on this observation,we propose a general training pipeline to defend against backdoor attacks actively.Benign models can be trained from the unreli-able dataset by decoupling the learning process into three stages,i.e.,supervised learning,active unlearning,and active semi-supervised fine-tuning.The effectiveness of our approach has been shown in numerous experiments across various backdoor attacks and datasets.展开更多
Federated Learning(FL),a burgeoning technology,has received increasing attention due to its privacy protection capability.However,the base algorithm FedAvg is vulnerable when it suffers from so-called backdoor attacks...Federated Learning(FL),a burgeoning technology,has received increasing attention due to its privacy protection capability.However,the base algorithm FedAvg is vulnerable when it suffers from so-called backdoor attacks.Former researchers proposed several robust aggregation methods.Unfortunately,due to the hidden characteristic of backdoor attacks,many of these aggregation methods are unable to defend against backdoor attacks.What's more,the attackers recently have proposed some hiding methods that further improve backdoor attacks'stealthiness,making all the existing robust aggregation methods fail.To tackle the threat of backdoor attacks,we propose a new aggregation method,X-raying Models with A Matrix(XMAM),to reveal the malicious local model updates submitted by the backdoor attackers.Since we observe that the output of the Softmax layer exhibits distinguishable patterns between malicious and benign updates,unlike the existing aggregation algorithms,we focus on the Softmax layer's output in which the backdoor attackers are difficult to hide their malicious behavior.Specifically,like medical X-ray examinations,we investigate the collected local model updates by using a matrix as an input to get their Softmax layer's outputs.Then,we preclude updates whose outputs are abnormal by clustering.Without any training dataset in the server,the extensive evaluations show that our XMAM can effectively distinguish malicious local model updates from benign ones.For instance,when other methods fail to defend against the backdoor attacks at no more than 20%malicious clients,our method can tolerate 45%malicious clients in the black-box mode and about 30%in Projected Gradient Descent(PGD)mode.Besides,under adaptive attacks,the results demonstrate that XMAM can still complete the global model training task even when there are 40%malicious clients.Finally,we analyze our method's screening complexity and compare the real screening time with other methods.The results show that XMAM is about 10–10000 times faster than the existing methods.展开更多
The pre-training-then-fine-tuning paradigm has been widely used in deep learning.Due to the huge computation cost for pre-training,practitioners usually download pre-trained models from the Internet and fine-tune them...The pre-training-then-fine-tuning paradigm has been widely used in deep learning.Due to the huge computation cost for pre-training,practitioners usually download pre-trained models from the Internet and fine-tune them on downstream datasets,while the downloaded models may suffer backdoor attacks.Different from previous attacks aiming at a target task,we show that a backdoored pre-trained model can behave maliciously in various downstream tasks without foreknowing task information.Attackers can restrict the output representations(the values of output neurons)of trigger-embedded samples to arbitrary predefined values through additional training,namely neuron-level backdoor attack(NeuBA).Since fine-tuning has little effect on model parameters,the fine-tuned model will retain the backdoor functionality and predict a specific label for the samples embedded with the same trigger.To provoke multiple labels in a specific task,attackers can introduce several triggers with predefined contrastive values.In the experiments of both natural language processing(NLP)and computer vision(CV),we show that NeuBA can well control the predictions for trigger-embedded instances with different trigger designs.Our findings sound a red alarm for the wide use of pre-trained models.Finally,we apply several defense methods to NeuBA and find that model pruning is a promising technique to resist NeuBA by omitting backdoored neurons.展开更多
Recently,deep neural networks have been shown to be vulnerable to backdoor attacks.A backdoor is inserted into neural networks via this attack paradigm,thus compromising the integrity of the network.As soon as an atta...Recently,deep neural networks have been shown to be vulnerable to backdoor attacks.A backdoor is inserted into neural networks via this attack paradigm,thus compromising the integrity of the network.As soon as an attacker presents a trigger during the testing phase,the backdoor in the model is activated,allowing the network to make specific wrong predictions.It is extremely important to defend against backdoor attacks since they are very stealthy and dangerous.In this paper,we propose a novel defense mechanism,Neural Behavioral Alignment(NBA),for backdoor removal.NBA optimizes the distillation process in terms of knowledge form and distillation samples to improve defense performance according to the characteristics of backdoor defense.NBA builds high-level representations of neural behavior within networks in order to facilitate the transfer of knowledge.Additionally,NBA crafts pseudo samples to induce student models exhibit backdoor neural behavior.By aligning the backdoor neural behavior from the student network with the benign neural behavior from the teacher network,NBA enables the proactive removal of backdoors.Extensive experiments show that NBA can effectively defend against six different backdoor attacks and outperform five state-of-the-art defenses.展开更多
高精度联邦学习模型的训练需要消耗大量的用户本地资源,参与训练的用户能够通过私自出售联合训练的模型获得非法收益.为实现联邦学习模型的产权保护,利用深度学习后门技术不影响主任务精度而仅对少量触发集样本造成误分类的特征,构建一...高精度联邦学习模型的训练需要消耗大量的用户本地资源,参与训练的用户能够通过私自出售联合训练的模型获得非法收益.为实现联邦学习模型的产权保护,利用深度学习后门技术不影响主任务精度而仅对少量触发集样本造成误分类的特征,构建一种基于模型后门的联邦学习水印(federated learning watermark based on backdoor,FLWB)方案,能够允许各参与训练的用户在其本地模型中分别嵌入私有水印,再通过云端的模型聚合操作将私有后门水印映射到全局模型作为联邦学习的全局水印.之后提出分步训练方法增强各私有后门水印在全局模型的表达效果,使得FLWB方案能够在不影响全局模型精度的前提下容纳各参与用户的私有水印.理论分析证明了FLWB方案的安全性,实验验证分步训练方法能够让全局模型在仅造成1%主任务精度损失的情况下有效容纳参与训练用户的私有水印.最后,采用模型压缩攻击和模型微调攻击对FLWB方案进行攻击测试,其结果表明FLWB方案在模型压缩到30%时仍能保留80%以上的水印,在4种不同的微调攻击下能保留90%以上的水印,具有很好的鲁棒性.展开更多
针对现有联邦学习后门防御方法不能实现对模型已嵌入后门特征的有效清除同时会降低主任务准确率的问题,提出了一种基于对比训练的联邦学习后门防御方法 Contra FL。利用对比训练来破坏后门样本在特征空间中的聚类过程,使联邦学习全局模...针对现有联邦学习后门防御方法不能实现对模型已嵌入后门特征的有效清除同时会降低主任务准确率的问题,提出了一种基于对比训练的联邦学习后门防御方法 Contra FL。利用对比训练来破坏后门样本在特征空间中的聚类过程,使联邦学习全局模型分类结果与后门触发器特征无关。具体而言,服务器通过执行触发器生成算法构造生成器池,以还原全局模型训练样本中可能存在的后门触发器;进而,服务器将触发器生成器池下发给各参与方,各参与方将生成的后门触发器添加至本地样本,以实现后门数据增强,最终通过对比训练有效消除后门攻击的负面影响。实验结果表明,Contra FL能够有效防御联邦学习中的多种后门攻击,且效果优于现有防御方法。展开更多
基金in part by the National Social Science Foundation of China under Grant 20BTQ058in part by the Natural Science Foundation of Hunan Province under Grant 2023JJ50033。
文摘Federated learning is widely used to solve the problem of data decentralization and can provide privacy protectionfor data owners. However, since multiple participants are required in federated learning, this allows attackers tocompromise. Byzantine attacks pose great threats to federated learning. Byzantine attackers upload maliciouslycreated local models to the server to affect the prediction performance and training speed of the global model. Todefend against Byzantine attacks, we propose a Byzantine robust federated learning scheme based on backdoortriggers. In our scheme, backdoor triggers are embedded into benign data samples, and then malicious localmodels can be identified by the server according to its validation dataset. Furthermore, we calculate the adjustmentfactors of local models according to the parameters of their final layers, which are used to defend against datapoisoning-based Byzantine attacks. To further enhance the robustness of our scheme, each localmodel is weightedand aggregated according to the number of times it is identified as malicious. Relevant experimental data showthat our scheme is effective against Byzantine attacks in both independent identically distributed (IID) and nonindependentidentically distributed (non-IID) scenarios.
基金funded by National Natural Science Foundation of China under Grant No.61806171The Sichuan University of Science&Engineering Talent Project under Grant No.2021RC15Sichuan University of Science&Engineering Graduate Student Innovation Fund under Grant No.Y2023115,The Scientific Research and Innovation Team Program of Sichuan University of Science and Technology under Grant No.SUSE652A006.
文摘Deep Neural Networks(DNNs)are integral to various aspects of modern life,enhancing work efficiency.Nonethe-less,their susceptibility to diverse attack methods,including backdoor attacks,raises security concerns.We aim to investigate backdoor attack methods for image categorization tasks,to promote the development of DNN towards higher security.Research on backdoor attacks currently faces significant challenges due to the distinct and abnormal data patterns of malicious samples,and the meticulous data screening by developers,hindering practical attack implementation.To overcome these challenges,this study proposes a Gaussian Noise-Targeted Universal Adversarial Perturbation(GN-TUAP)algorithm.This approach restricts the direction of perturbations and normalizes abnormal pixel values,ensuring that perturbations progress as much as possible in a direction perpendicular to the decision hyperplane in linear problems.This limits anomalies within the perturbations improves their visual stealthiness,and makes them more challenging for defense methods to detect.To verify the effectiveness,stealthiness,and robustness of GN-TUAP,we proposed a comprehensive threat model.Based on this model,extensive experiments were conducted using the CIFAR-10,CIFAR-100,GTSRB,and MNIST datasets,comparing our method with existing state-of-the-art attack methods.We also tested our perturbation triggers using various defense methods and further experimented on the robustness of the triggers against noise filtering techniques.The experimental outcomes demonstrate that backdoor attacks leveraging perturbations generated via our algorithm exhibit cross-model attack effectiveness and superior stealthiness.Furthermore,they possess robust anti-detection capabilities and maintain commendable performance when subjected to noise-filtering methods.
文摘In recent years,the number of parameters of deep neural networks(DNNs)has been increasing rapidly.The training of DNNs is typically computation-intensive.As a result,many users leverage cloud computing and outsource their training procedures.Outsourcing computation results in a potential risk called backdoor attack,in which a welltrained DNN would performabnormally on inputs with a certain trigger.Backdoor attacks can also be classified as attacks that exploit fake images.However,most backdoor attacks design a uniformtrigger for all images,which can be easilydetectedand removed.In this paper,we propose a novel adaptivebackdoor attack.We overcome this defect and design a generator to assign a unique trigger for each image depending on its texture.To achieve this goal,we use a texture complexitymetric to create a specialmask for eachimage,which forces the trigger tobe embedded into the rich texture regions.The trigger is distributed in texture regions,which makes it invisible to humans.Besides the stealthiness of triggers,we limit the range of modification of backdoor models to evade detection.Experiments show that our method is efficient in multiple datasets,and traditional detectors cannot reveal the existence of a backdoor.
基金supported by the National Natural Science Foundation of China(Grant No.62076042)the National Key Research and Development Plan of China,Key Project of Cyberspace Security Governance(Grant No.2022YFB3103103)the Key Research and Development Project of Sichuan Province(Grant Nos.2022YFS0571,2021YFSY0012,2021YFG0332,and 2020YFG0307)。
文摘Backdoor attacks are emerging security threats to deep neural networks.In these attacks,adversaries manipulate the network by constructing training samples embedded with backdoor triggers.The backdoored model performs as expected on clean test samples but consistently misclassifies samples containing the backdoor trigger as a specific target label.While quantum neural networks(QNNs)have shown promise in surpassing their classical counterparts in certain machine learning tasks,they are also susceptible to backdoor attacks.However,current attacks on QNNs are constrained by the adversary's understanding of the model structure and specific encoding methods.Given the diversity of encoding methods and model structures in QNNs,the effectiveness of such backdoor attacks remains uncertain.In this paper,we propose an algorithm that leverages dataset-based optimization to initiate backdoor attacks.A malicious adversary can embed backdoor triggers into a QNN model by poisoning only a small portion of the data.The victim QNN maintains high accuracy on clean test samples without the trigger but outputs the target label set by the adversary when predicting samples with the trigger.Furthermore,our proposed attack cannot be easily resisted by existing backdoor detection methods.
基金This project was funded by the Deanship of Scientific Research(DSR)at King Abdulaziz University,Jeddah,under Grant No.(RG-91-611-42).
文摘The Deep Neural Networks(DNN)training process is widely affected by backdoor attacks.The backdoor attack is excellent at concealing its identity in the DNN by performing well on regular samples and displaying malicious behavior with data poisoning triggers.The state-of-art backdoor attacks mainly follow a certain assumption that the trigger is sample-agnostic and different poisoned samples use the same trigger.To overcome this problem,in this work we are creating a backdoor attack to check their strength to withstand complex defense strategies,and in order to achieve this objective,we are developing an improved Convolutional Neural Network(ICNN)model optimized using a Gradient-based Optimization(GBO)(ICNN-GBO)algorithm.In the ICNN-GBO model,we are injecting the triggers via a steganography and regularization technique.We are generating triggers using a single-pixel,irregular shape,and different sizes.The performance of the proposed methodology is evaluated using different performance metrics such as Attack success rate,stealthiness,pollution index,anomaly index,entropy index,and functionality.When the CNN-GBO model is trained with the poisoned dataset,it will map the malicious code to the target label.The proposed scheme’s effectiveness is verified by the experiments conducted on both the benchmark datasets namely CIDAR-10 andMSCELEB 1M dataset.The results demonstrate that the proposed methodology offers significant defense against the conventional backdoor attack detection frameworks such as STRIP and Neutral cleanse.
基金supported by the National Nature Science Foundation of China under Grant No.62272007National Nature Science Foundation of China under Grant No.U1936119Major Technology Program of Hainan,China(ZDKJ2019003)。
文摘Deep learning models are well known to be susceptible to backdoor attack,where the attacker only needs to provide a tampered dataset on which the triggers are injected.Models trained on the dataset will passively implant the backdoor,and triggers on the input can mislead the models during testing.Our study shows that the model shows different learning behaviors in clean and poisoned subsets during training.Based on this observation,we propose a general training pipeline to defend against backdoor attacks actively.Benign models can be trained from the unreli-able dataset by decoupling the learning process into three stages,i.e.,supervised learning,active unlearning,and active semi-supervised fine-tuning.The effectiveness of our approach has been shown in numerous experiments across various backdoor attacks and datasets.
基金Supported by the Fundamental Research Funds for the Central Universities(328202204)。
文摘Federated Learning(FL),a burgeoning technology,has received increasing attention due to its privacy protection capability.However,the base algorithm FedAvg is vulnerable when it suffers from so-called backdoor attacks.Former researchers proposed several robust aggregation methods.Unfortunately,due to the hidden characteristic of backdoor attacks,many of these aggregation methods are unable to defend against backdoor attacks.What's more,the attackers recently have proposed some hiding methods that further improve backdoor attacks'stealthiness,making all the existing robust aggregation methods fail.To tackle the threat of backdoor attacks,we propose a new aggregation method,X-raying Models with A Matrix(XMAM),to reveal the malicious local model updates submitted by the backdoor attackers.Since we observe that the output of the Softmax layer exhibits distinguishable patterns between malicious and benign updates,unlike the existing aggregation algorithms,we focus on the Softmax layer's output in which the backdoor attackers are difficult to hide their malicious behavior.Specifically,like medical X-ray examinations,we investigate the collected local model updates by using a matrix as an input to get their Softmax layer's outputs.Then,we preclude updates whose outputs are abnormal by clustering.Without any training dataset in the server,the extensive evaluations show that our XMAM can effectively distinguish malicious local model updates from benign ones.For instance,when other methods fail to defend against the backdoor attacks at no more than 20%malicious clients,our method can tolerate 45%malicious clients in the black-box mode and about 30%in Projected Gradient Descent(PGD)mode.Besides,under adaptive attacks,the results demonstrate that XMAM can still complete the global model training task even when there are 40%malicious clients.Finally,we analyze our method's screening complexity and compare the real screening time with other methods.The results show that XMAM is about 10–10000 times faster than the existing methods.
基金supported by the National Key Research and Development Program of China(No.2020AAA0106500)the National Natural Science Foundation of China(NSFC No.62236004).
文摘The pre-training-then-fine-tuning paradigm has been widely used in deep learning.Due to the huge computation cost for pre-training,practitioners usually download pre-trained models from the Internet and fine-tune them on downstream datasets,while the downloaded models may suffer backdoor attacks.Different from previous attacks aiming at a target task,we show that a backdoored pre-trained model can behave maliciously in various downstream tasks without foreknowing task information.Attackers can restrict the output representations(the values of output neurons)of trigger-embedded samples to arbitrary predefined values through additional training,namely neuron-level backdoor attack(NeuBA).Since fine-tuning has little effect on model parameters,the fine-tuned model will retain the backdoor functionality and predict a specific label for the samples embedded with the same trigger.To provoke multiple labels in a specific task,attackers can introduce several triggers with predefined contrastive values.In the experiments of both natural language processing(NLP)and computer vision(CV),we show that NeuBA can well control the predictions for trigger-embedded instances with different trigger designs.Our findings sound a red alarm for the wide use of pre-trained models.Finally,we apply several defense methods to NeuBA and find that model pruning is a promising technique to resist NeuBA by omitting backdoored neurons.
基金This work was supported by the National Natural Science Foundation of China under Grant No.62272007the National Natural Science Foundation of China under Grant No.U1936119the Major Science and Technology Project of Hainan Province under Grant No.ZDKJ2019003.
文摘Recently,deep neural networks have been shown to be vulnerable to backdoor attacks.A backdoor is inserted into neural networks via this attack paradigm,thus compromising the integrity of the network.As soon as an attacker presents a trigger during the testing phase,the backdoor in the model is activated,allowing the network to make specific wrong predictions.It is extremely important to defend against backdoor attacks since they are very stealthy and dangerous.In this paper,we propose a novel defense mechanism,Neural Behavioral Alignment(NBA),for backdoor removal.NBA optimizes the distillation process in terms of knowledge form and distillation samples to improve defense performance according to the characteristics of backdoor defense.NBA builds high-level representations of neural behavior within networks in order to facilitate the transfer of knowledge.Additionally,NBA crafts pseudo samples to induce student models exhibit backdoor neural behavior.By aligning the backdoor neural behavior from the student network with the benign neural behavior from the teacher network,NBA enables the proactive removal of backdoors.Extensive experiments show that NBA can effectively defend against six different backdoor attacks and outperform five state-of-the-art defenses.
文摘高精度联邦学习模型的训练需要消耗大量的用户本地资源,参与训练的用户能够通过私自出售联合训练的模型获得非法收益.为实现联邦学习模型的产权保护,利用深度学习后门技术不影响主任务精度而仅对少量触发集样本造成误分类的特征,构建一种基于模型后门的联邦学习水印(federated learning watermark based on backdoor,FLWB)方案,能够允许各参与训练的用户在其本地模型中分别嵌入私有水印,再通过云端的模型聚合操作将私有后门水印映射到全局模型作为联邦学习的全局水印.之后提出分步训练方法增强各私有后门水印在全局模型的表达效果,使得FLWB方案能够在不影响全局模型精度的前提下容纳各参与用户的私有水印.理论分析证明了FLWB方案的安全性,实验验证分步训练方法能够让全局模型在仅造成1%主任务精度损失的情况下有效容纳参与训练用户的私有水印.最后,采用模型压缩攻击和模型微调攻击对FLWB方案进行攻击测试,其结果表明FLWB方案在模型压缩到30%时仍能保留80%以上的水印,在4种不同的微调攻击下能保留90%以上的水印,具有很好的鲁棒性.
文摘针对现有联邦学习后门防御方法不能实现对模型已嵌入后门特征的有效清除同时会降低主任务准确率的问题,提出了一种基于对比训练的联邦学习后门防御方法 Contra FL。利用对比训练来破坏后门样本在特征空间中的聚类过程,使联邦学习全局模型分类结果与后门触发器特征无关。具体而言,服务器通过执行触发器生成算法构造生成器池,以还原全局模型训练样本中可能存在的后门触发器;进而,服务器将触发器生成器池下发给各参与方,各参与方将生成的后门触发器添加至本地样本,以实现后门数据增强,最终通过对比训练有效消除后门攻击的负面影响。实验结果表明,Contra FL能够有效防御联邦学习中的多种后门攻击,且效果优于现有防御方法。