基于PPO算法的攻击路径发现与寻优方法被引量：1

Discovery and Optimization Method of Attack Paths Based on PPO Algorithm

下载PDF

导出

摘要基于策略网络选择渗透动作发现最优攻击路径,是自动化渗透测试的一项关键技术。然而,现有方法在训练过程中存在无效动作过多、收敛速度慢等问题。为了解决这些问题,文章将PPO(Proximal Policy Optimization)算法用于解决攻击路径寻优问题,并提出带有渗透动作选择模块的改进型PPO算法IPPOPAS(Improved PPO with Penetration Action Selection),该算法在获取回合经验时,根据渗透测试场景进行动作筛选。文章设计实现IPPOPAS算法的各个组件,包括策略网络、价值网络和渗透动作选择模块等,对动作选择过程进行改进,并进行参数调优和算法优化,提高了算法的性能和效率。实验结果表明,IPPOPAS算法在特定网络场景中的收敛速度优于传统深度强化学习算法DQN(Deep Q Network)及其改进算法,并且随着主机中漏洞数量的增加,该算法的收敛速度更快。此外,实验还验证了在网络规模扩大的情况下IPPOPAS算法的有效性。 Selecting penetration actions based on policy networks and discovering the optimal attack path is a crucial technology in automated penetration testing.However,existing methods have issues such as excessive ineffective actions and slow convergence speed during the training process.To address these problems,this paper applied the proximal policy optimization(PPO)algorithm to the attack path optimization problem and proposed an improved version called improved PPO with penetration action selection(IPPOPAS)that incorporated a penetration action selection module.This module enabled the algorithm to select actions based on the penetration testing scenario during the experience collection phase.The paper designd and implemented various components of the IPPOPAS algorithm,including policy networks,value networks,and the penetration action selection module,to enhance the action selection process.Parameter tuning and algorithm optimization were also performed to improve the performance and efficiency of the algorithm.Experimental results demonstrate that the IPPOPAS algorithm achieves faster convergence speed compared to traditional DQN algorithms and their variations in specific network scenarios.Additionally,the algorithm exhibits even faster convergence speed with an increasing number of vulnerabilities in the host.Furthermore,the effectiveness of the IPPOPAS algorithm is validated in scenarios with expanded network scales.

作者张国敏张少勇张津威 ZHANG Guomin;ZHANG Shaoyong;ZHANG Jinwei(Institute of Command and Control Engineering,Army Engineering University of PLA,Nanjing 210007,China)

机构地区陆军工程大学指挥控制工程学院

出处《信息网络安全》 CSCD 北大核心 2023年第9期47-57,共11页 Netinfo Security

基金国家自然科学基金[62172432]。

关键词自动化渗透测试策略网络 PPO算法攻击路径发现 automated penetration testing policy network PPO algorithm attack path discovery

分类号 TP309 [自动化与计算机技术—计算机系统结构]

引文网络
相关文献

同被引文献2

1孙岳,李蓓蕾,梁彩虹,李颖.块衰落信道下串联多链空间耦合LDPC码设计[J].西安电子科技大学学报,2019,46(2):1-5. 被引量：4
2Linghui Zeng,Fuqiang Yao,Jianzhao Zhang,Min Jia.Dynamic Spectrum Access Based on Prior Knowledge Enabled Reinforcement Learning with Double Actions in Complex Electromagnetic Environment[J].China Communications,2022,19(7):13-24. 被引量：4

引证文献1

1周权,牛英滔.基于相似性样本生成的深度强化学习快速抗干扰算法[J].通信学报,2024,45(7):117-126.

1郝辉,李雪瑞,张雨森.人工蜂群算法及其在路径寻优问题上的应用[J].火力与指挥控制,2023,48(3):57-66. 被引量：2
2王武斌,王锦,窦蒙蒙,张清云,王晓月,霍凯利,韩超.纳滤膜镁锂分离机理与选择渗透性研究进展[J].中国环境科学,2023,43(8):3983-3993. 被引量：1
3高凤凤,张雪锋,张忠林,郝晓刚.电控离子选择渗透膜分离过程传质模型的构建[J].太原理工大学学报,2023,54(4):609-618. 被引量：2
4李德龙,刘德海.城市轨道交通枢纽智慧安检联防信号博弈模型[J].系统工程理论与实践,2022,42(12):3363-3380. 被引量：3
5徐超毅,龚桥梁.基于改进蚁群算法的机器人路径规划研究[J].河南城建学院学报,2023,32(4):91-96. 被引量：3
6彭熙舜,陆安江,刘嘉豪,赵翊博,唐鑫鑫,龙纪安.黄金正弦下RRT势场算法的三维路径规划研究[J].火力与指挥控制,2022,47(12):145-151. 被引量：1
7刘向勇,魏海翔,黄伟白,郭建国.基于改进蚁群算法的主轴回转圆度误差四点法分离技术[J].机电工程技术,2023,52(7):234-238.
8柳玉,张莉丽,赵志军.一种适用于非连通交通网络的军事轮式平台机动路径规划方法研究[J].计算机应用与软件,2023,40(6):337-342.
9闵春磊.基于Stacking学习的地区工业经济预测模型研究——以新疆为例[J].现代营销（下）,2023(7):95-97.
10吴潇,杨颖,刘刚,张倩,宁远霖.基于迁移学习和改进ResNet34的猪个体识别方法[J].中国农机化学报,2023,44(9):214-221. 被引量：2

信息网络安全

2023年第9期

浏览历史

内容加载中请稍等...

基于PPO算法的攻击路径发现与寻优方法被引量：1

同被引文献2

引证文献1

相关作者

相关机构

相关主题

浏览历史

基于PPO算法的攻击路径发现与寻优方法 被引量：1

同被引文献2

引证文献1

相关作者

相关机构

相关主题

浏览历史

基于PPO算法的攻击路径发现与寻优方法被引量：1