期刊文献+

基于近端策略优化算法和Mask-TIT网络的多功能雷达干扰决策方法

A Multi⁃functional Radar Jamming Decision Method Based on Proximal Policy Op⁃timization Algorithm and Mask⁃TIT Network
下载PDF
导出
摘要 为应对愈加智能的多功能雷达给对抗方带来的挑战,本文提出一种基于近端策略优化(Proximal policy optimization,PPO)算法和Mask?TIT(Mask?Transformer in Transformer)网络的干扰决策方法。首先,从一种现实场景出发,将干扰机与雷达的对抗场景建模为部分可观察马尔可夫决策过程(Partially observable Markov decision process,POMDP),根据雷达工作原理设计了新的状态转移函数和奖励函数,并根据多功能雷达层级模型设计了观测空间。其次,利用Transformer对序列数据的表征能力和雷达干扰样式的特点设计了一种Mask?TIT网络结构,用于构建更强大的Actor?Critic网络架构。最后,使用近端策略优化算法进行优化学习。实验结果表明,该算法较现有方法收敛所需交互数据平均减少25.6%,并且收敛后的方差显著降低。 To cope with the challenges brought by increasingly intelligent multifunctional radars to the opposing side,this paper proposes an jamming decision-making method based on the proximal policy optimization(PPO)algorithm and the Mask-Transformer in Transformer(Mask-TIT)network.Firstly,starting from a realistic scenario,the adversarial scene between the jammer and the radar is modeled as a partially observable Markov decision process(POMDP).A new state transition function and reward function are designed based on the working principles of the radar,and the observation space is designed according to the hierarchy of the multifunctional radar model.Secondly,a Mask-TIT network structure is designed using the Transformer’s representation capacity for sequence data and the characteristics of radar jamming patterns,which is used to build a more powerful Actor-Critic network architecture.Finally,the PPO algorithm is used for optimization learning.Experimental results show that compared with existing methods,the proposed algorithm reduces the average amount of interactive data required for convergence by 25.6%,and the variance after convergence is significantly reduced.
作者 娄雨璇 孙闽红 尹帅 LOU Yuxuan;SUN Minhong;YIN Shuai(School of Communication Engineering,Hangzhou Dianzi University,Hangzhou 310018,China)
出处 《数据采集与处理》 CSCD 北大核心 2024年第6期1355-1369,共15页 Journal of Data Acquisition and Processing
关键词 雷达干扰决策 部分可观察马尔可夫决策过程 强化学习 TRANSFORMER 近端策略优化 radar jamming decision partially observable Markov decision process(POMDP) reinforcement learning Transformer proximal policy optimization(PPO)
  • 相关文献

参考文献6

二级参考文献25

  • 1孙宏伟,童宁宁,孙富君.基于D-S证据理论的电子干扰模式选择[J].弹箭与制导学报,2003,23(S2):218-220. 被引量:9
  • 2杜春侠,高云,张文.多智能体系统中具有先验知识的Q学习算法[J].清华大学学报(自然科学版),2005,45(7):981-984. 被引量:21
  • 3高彬,郭庆丰.BP神经网络在电子战效能评估中的应用[J].电光与控制,2007,14(1):69-71. 被引量:21
  • 4王世进,孙晟,周炳海,奚立峰.基于Q-学习的动态单机调度[J].上海交通大学学报,2007,41(8):1227-1232. 被引量:11
  • 5National Institutes of Health, National Institute of Mental Health (NIMH). Definition of cognition[EB/OL].[2015-05-06].http://science-education.nih.gov/supplements/nih5/Mental/other/glossary.htm.
  • 6Li Husheng, Han Zhu. Dogfight in spectrum:combating primary user emulation attacks in cognitive radio systems-part ii:unknown channel statistics[J]. IEEE Transactions on Wireless Communications, 2011,10(1):274-283.
  • 7Bush R R, Mosteller F. Stochastic models for learning[M]. New York:Wiley,1955.
  • 8Minsky M L. Theory of neural analog reinforcement systems and its application to the brain model problem[D]. New Jersey, USA:Princeton University, 1954.
  • 9Watkins J C H, Dayan P. Q-learning[J]. Machine Learning, 1992,8:279-292.
  • 10陈凯.对相控阵雷达的智能干扰决策技术研究[J].西安:西安电子科技大学,2012.

共引文献65

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部