期刊文献+

基于最大策略熵深度强化学习的通信干扰资源分配方法 被引量:10

Allocation method of communication interference resource based on deep reinforcement learning of maximum policy entropy
下载PDF
导出
摘要 针对通信组网对抗中干扰资源分配的优化问题,提出了一种基于最大策略熵深度强化学习(MPEDRL)的干扰资源分配方法。该方法将深度强化学习思想引入到通信对抗干扰资源分配领域,并通过加入最大策略熵准则且自适应调整熵系数,以增强策略探索性加速收敛至全局最优。该方法将干扰资源分配建模为马尔可夫决策过程,通过建立干扰策略网络输出分配方案,构建剪枝孪生结构的干扰效果评估网络完成方案效能评估,以策略熵最大化和累积干扰效能最大化为目标训练策略网络和评估网络,决策干扰资源最优分配方案。仿真结果表明,所提出的方法能有效解决组网对抗中的干扰资源分配问题,且相比于已有的深度强化学习方法具有学习速度更快,训练过程波动性更小等优点,干扰效能高出DDPG方法15%。 In order to solve the optimization of the interference resource allocation in communication network countermeasures,an interference resource allocation method based on the maximum policy entropy deep reinforcement learning(MPEDRL)was proposed.The method introduced the idea of deep reinforcement learning into the communication countermeasures resource allocation,it could enhance the exploration of the policy and accelerate the convergence to the global optimum with adding the maximum policy entropy criterion and adaptively adjusting the entropy coefficient.The method modeled interference resource allocation as Markov decision process,then established the interference strategy network to output allocation scheme,constructing the interference effect evaluation network of the clipped twin structure for efficiency evaluation,and trained the policy network and the evaluation network with the goal of maximizing the strategy entropy and the cumulative interference efficacy,then decided the optimal interference resource allocation scheme.The simulation results show that the algorithm can effectively solve the resource allocation problem in communication network confrontation,comparing with the existing deep reinforcement learning methods,it has faster learning speed and less fluctuation in the training process,and achieved 15%higher jamming efficacy than DDPG-based method.
作者 饶宁 许华 齐子森 宋佰霖 史蕴豪 RAO Ning;XU Hua;QI Zisen;SONG Bailin;SHI Yunhao(College of Information and Navigation, Air Force Engineering University, Xi′an 710077, China)
出处 《西北工业大学学报》 EI CAS CSCD 北大核心 2021年第5期1077-1086,共10页 Journal of Northwestern Polytechnical University
基金 国家自然科学基金(61601500)资助。
关键词 干扰资源分配 深度强化学习 最大策略熵 神经网络 interference resource allocation deep reinforcement learning maximum policy entropy deep neural network
  • 相关文献

参考文献4

二级参考文献28

  • 1吕永胜,王树宗,王向伟,王江枫.基于贴近度的雷达干扰资源分配策略研究[J].系统工程与电子技术,2005,27(11):1893-1894. 被引量:36
  • 2高彬,吕善伟,郭庆丰,张娜.遗传算法在电子战干扰规划中的应用[J].北京航空航天大学学报,2006,32(8):933-936. 被引量:27
  • 3沈阳,陈永光,李修和.基于0-1规划的雷达干扰资源优化分配研究[J].兵工学报,2007,28(5):528-532. 被引量:45
  • 4康立山,谢云等,非数值并行算法(第一册)--模拟退火算法[M].北京:科学出版社,1998.
  • 5ZHAI X F and ZHUANG Y.IIGA based algorithm for cooperative jamming resource allocation[C].Asia Pacific Conference on Postgraduate Research,Shanghai,China,2009:368-371.
  • 6XUE Y,ZHUANG Y,NI T Q,et al.One improved genetic algorithm applied in the problem of dynamic jam resource scheduling with multi-objective and multi-constraint[C].IEEE 5th International Conference on Bio-inspired Computing:Theories and Applications,Shanghai,China,2010:708-712.
  • 7YANG X S and DEB S.Cuckoo search via levy flights[C].Proceedings of IEEE World Congress on Nature & Biological Inspired Computing,India,2009:210-214.
  • 8YANG X S and DEB S.Multi objective cuckoo search for design optimization[J].Computers & Operations Research,2011,10(9):1-9.
  • 9ZHENG H Q and ZHOU Y Q.A discrete binary version of cuckoo search for knapsack problems[J].Advances in Information Science and Service Sciences,2012,4(18):331-339.
  • 10OUYANG X X,ZHOU Y Q,LUO Q F,et al.A novel discrete cuckoo search algorithm for spherical traveling salesman problem[J].Applied Mathematical & Information Sciences,2013,7(2):777-784.

共引文献95

同被引文献98

引证文献10

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部