面向深度强化学习的鲁棒性增强方法

Robustness Enhancement Method for Deep Reinforcement Learning

下载PDF

导出

摘要深度强化学习(Deep Reinforcement Learning, DRL)结合了深度学习的感知能力和强化学习的决策能力,被应用于许多领域.然而,一旦攻击者窃取了DRL数据,就能干扰状态、奖励及动作或环境,从而影响智能体的决策.且已有研究表明DRL模型极易受到恶意攻击,攻击者根据状态及动作空间信息,训练等价模型实现黑盒攻击.为了实现DRL数据隐私保护及模型鲁棒性增强,本文提出一种基于垂直联邦的DRL模型(Vertical Federated based DRL,VF-DRL).VF-DRL搭建多个客户端并保证数据特征不重叠.同时服务器端上传各个客户端输出的隐层特征以保证数据隐私.进一步,本文对比不同基线算法,通过大量实验评估了VF-DRL模型的性能.假设存在一个恶意客户端执行对抗攻击的情况下,使用多种对抗攻击方法验证了VF-DRL模型的鲁棒性.同时在高维及较低维环境中验证VF-DRL模型的鲁棒性,并进一步分析影响其鲁棒性的因素. Deep Reinforcement Learning(DRL)combines the perception ability of deep learning with the decision-making ability of reinforcement learning and has been applied in many fields.However,if an attacker steals DRL data,interference with state,reward,and action or environment can be easily implemented to affect agent decisions.Moreover,previous studies have shown that DRL models are vulnerable to malicious attacks,so equivalent models can be trained to achieve black box attacks according to state and action space information.To realize DRL data protection and model robustness enhancement,this paper proposes a DRL model based on Vertical federation.Considering the privacy of data,a robust DRL model is established by combining vertical Federated based DRL,namely VF-DRL.VF-DRL builds multiple clients and ensures that data features do not overlap.Meanwhile,the server uploads hidden layer features output by each client to ensure data privacy.Further,this paper compares different baseline algorithms and evaluates the performance of the VF-DRL model through a large number of experiments.The robustness of the VF-DRL model is verified by using a variety of adversarial attack methods when a malicious client executes an adversarial attack.At the same time,the robustness of the VF-DRL model is verified in high-dimensional and low-dimensional environments,and the factors affecting its robustness are further analyzed.

作者葛杰郑海斌陈晋音 GE Jie;ZHENG Haibin;CHEN Jinyin(The College of Information Engineering,Zhejiang University of Technology,Hangzhou 310023,China;Institute of Cyberspace Security,Zhejiang University of Technology,Hangzhou 310023,China)

机构地区浙江工业大学信息工程学院浙江工业大学网络空间安全研究院

出处《小型微型计算机系统》 CSCD 北大核心 2024年第7期1552-1560,共9页 Journal of Chinese Computer Systems

基金国家自然科学基金项目(62072406)资助浙江省自然科学基金项目(DQ23F020001)资助信息系统安全技术重点实验室基金项目(61421110502)资助。

关键词深度强化学习垂直联邦学习隐私保护对抗攻击鲁棒性增强 deep reinforcement learning vertical federated learning privacy protection adversarial attack robustness enhancement

分类号 TP183 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

参考文献1

1林凯,卢宇,陈星,林兵.一种面向自动驾驶推理任务的工作流调度策略[J].小型微型计算机系统,2021,42(3):632-639. 被引量：3

二级参考文献5

1陈圣磊,吴慧中,肖亮,朱耀琴.协同设计任务调度的多步Q学习算法[J].计算机辅助设计与图形学学报,2007,19(3):398-402. 被引量：11
2郭茂祖,王亚东,刘扬,孙华梅.基于Metropolis准则的Q-学习算法研究[J].计算机研究与发展,2002,39(6):684-688. 被引量：14
3王娟娟,乔颖,王宏安.基于图模型的自动驾驶推理任务调度[J].计算机研究与发展,2017,54(8):1693-1702. 被引量：4
4简琤峰,裘科意,张美玉.一种面向边缘计算QoE的服务组合及调度方法[J].小型微型计算机系统,2019,40(7):1397-1403. 被引量：10
5高振海,孙天骏,何磊.汽车纵向自动驾驶的因果推理型决策[J].吉林大学学报（工学版）,2019,49(5):1392-1404. 被引量：12

共引文献2

1王雪柯,陈晋音,陆小松,张旭鸿.结合模仿对抗策略的深度强化学习鲁棒性增强方法[J].小型微型计算机系统,2023,44(5):930-938.
2胡晟熙,宋日荣,陈星,陈哲毅.云边协同计算中基于强化学习的依赖型任务调度方法[J].计算机科学,2023,50(S02):700-707. 被引量：1

1左硕,朱峰跃,陈大庆,刘绍平,段辛斌,刘明典.怒江西藏段鱼类群落结构及多样性研究[J].水生态学杂志,2024,45(4):73-81.

小型微型计算机系统

2024年第7期

浏览历史

内容加载中请稍等...

面向深度强化学习的鲁棒性增强方法

参考文献1

二级参考文献5

共引文献2

相关作者

相关机构

相关主题

浏览历史