摘要
该文将层次强化学习与渗透测试相结合,将渗透测试过程建模为半马尔可夫决策模型,训练Agent在模拟网络环境中完成渗透测试路径发现,并提出了一种改进的基于Actor-Critic框架改进的自动分层记忆AHM-DQN算法(Automatic hierarchical memory Deep Q Networks,AHM-DQN)。首先,在Actor网络中加入一个双向递归神经网络作为同一Agent的信息交换层;其次,在Critic网络加入其他种类的Agent信息来学习多种Agent协同策略。该算法在Actor-Critic算法上进行了以下改进:一是集成了自动分层功能,对任务目标和动作的选择进行自动分层,提高了算法的效率;二是结合记忆因子,帮助Agent有效地记忆和学习,解决奖励值的稀疏性问题,提高算法精度。该算法在学习效率和收敛速度上优于传统的分层学习算法,解决了渗透测试主要依靠人工进行的问题。
In this paper, hierarchical reinforcement learning and penetration testing are combined, the process of penetration testing is modeled as a semi-Markov decision model, and the Agent is trained to complete penetration testing path discovery in the simulated network environment. An improved automatic hierarchical memory Deep Q Networks(Automatic hierarchical memory Deep Q Networks, AHM-DQN) algorithm based on Actor Critical framework is proposed. First, a bidirectional recurrent neural network is added to Actor network as the information exchange layer of the same Agent;Secondly, add other kinds of agent information to the Critic network to learn multiple agent cooperation strategies. The algorithm makes the following improvements on Actor-Critic algorithm:First, it integrates the automatic layering function to automatically layer the selection of task objectives and actions,which improves the efficiency of the algorithm;The second is to combine memory factors to help Agent effectively remember and learn, solve the sparsity problem of reward values, and improve the accuracy of the algorithm. The algorithm is superior to the traditional hierarchical learning algorithm in learning efficiency and convergence speed,and solves the problem that penetration testing mainly depends on manual work.
作者
陆燕
杨秋芬
LU Yan;YANG Qiufen(Hunan Open University,Changsha,Hunan Province,410004 China)
出处
《科技资讯》
2022年第21期5-10,共6页
Science & Technology Information
基金
湖南开放大学2021年度校级科研课题“改进的分层强化学习算法在自动化渗透测试路径发现中的应用研究”(项目编号:XDK-2021-A-4)
湖南省教育厅科学研究项目“基于Actor-Critic框架的DDPG算法优化研究”(项目编号:21C1186)
湖南省职业院校教育教学改革研究项目“基于深度学习的高职课堂教学评价研究”(项目编号:ZJGB2021189)
湖南省自然科学基金项目“基于AdaBoost的哈欠检测算法研究”(项目编号:2021JJ60038)。