摘要
基于策略网络选择渗透动作发现最优攻击路径,是自动化渗透测试的一项关键技术。然而,现有方法在训练过程中存在无效动作过多、收敛速度慢等问题。为了解决这些问题,文章将PPO(Proximal Policy Optimization)算法用于解决攻击路径寻优问题,并提出带有渗透动作选择模块的改进型PPO算法IPPOPAS(Improved PPO with Penetration Action Selection),该算法在获取回合经验时,根据渗透测试场景进行动作筛选。文章设计实现IPPOPAS算法的各个组件,包括策略网络、价值网络和渗透动作选择模块等,对动作选择过程进行改进,并进行参数调优和算法优化,提高了算法的性能和效率。实验结果表明,IPPOPAS算法在特定网络场景中的收敛速度优于传统深度强化学习算法DQN(Deep Q Network)及其改进算法,并且随着主机中漏洞数量的增加,该算法的收敛速度更快。此外,实验还验证了在网络规模扩大的情况下IPPOPAS算法的有效性。
Selecting penetration actions based on policy networks and discovering the optimal attack path is a crucial technology in automated penetration testing.However,existing methods have issues such as excessive ineffective actions and slow convergence speed during the training process.To address these problems,this paper applied the proximal policy optimization(PPO)algorithm to the attack path optimization problem and proposed an improved version called improved PPO with penetration action selection(IPPOPAS)that incorporated a penetration action selection module.This module enabled the algorithm to select actions based on the penetration testing scenario during the experience collection phase.The paper designd and implemented various components of the IPPOPAS algorithm,including policy networks,value networks,and the penetration action selection module,to enhance the action selection process.Parameter tuning and algorithm optimization were also performed to improve the performance and efficiency of the algorithm.Experimental results demonstrate that the IPPOPAS algorithm achieves faster convergence speed compared to traditional DQN algorithms and their variations in specific network scenarios.Additionally,the algorithm exhibits even faster convergence speed with an increasing number of vulnerabilities in the host.Furthermore,the effectiveness of the IPPOPAS algorithm is validated in scenarios with expanded network scales.
作者
张国敏
张少勇
张津威
ZHANG Guomin;ZHANG Shaoyong;ZHANG Jinwei(Institute of Command and Control Engineering,Army Engineering University of PLA,Nanjing 210007,China)
出处
《信息网络安全》
CSCD
北大核心
2023年第9期47-57,共11页
Netinfo Security
基金
国家自然科学基金[62172432]。
关键词
自动化渗透测试
策略网络
PPO算法
攻击路径发现
automated penetration testing
policy network
PPO algorithm
attack path discovery