摘要
认知雷达对抗技术可使干扰系统具有自主学习能力来实现智能干扰决策。现有基于强化学习理论的干扰决策方法难以在实时性要求高、对抗时间受限、雷达策略快变的雷达对抗环境中获得高期望收益。文中基于多臂匪徒决策理论提出了一种时变环境下基于最大期望加权估计的在线干扰决策方法,通过最大期望加权方法提高了对收益最大臂估计正确率,通过学习时间漂移方法使得干扰决策具有对雷达时变环境的适应性。典型时变环境设置的数值仿真表明,该方法具有在时变环境中更高的决策收益和环境时变适应能力。
Cognitive radar countermeasure technology can be exploited by jamming system to make intelligent decision without prior knowledge.Employing existing jamming strategy based on reinforcement learning theory,desirable benefit cannot be obtained in the radar countermeasures environment where real-time response is required,jamming time is limited and radar strategy changes rapidly.Based on multi-armed bandit(MAB)theory,an online intelligent jamming strategy is proposed in this paper using the maximum expected value weighted(MEVW)estimation method and learning-window shifting(LWS)approach,where MEVW can improve the estimation accuracy about maximal benefit arm,and LWS allow jamming to adapt to time-varying environment.Numerical experiments in typical time-varying environments show that the proposed has higher decision benefits and better adaptability than traditional methods.
作者
王军
叶立诚
刘帅
韩冬梅
WANG Jun;YE Licheng;LIU Shuai;HAN Dongmei(School of Information Science and Engineering,Harbin Institute of Technology at Weihai,Weihai 264209,China;Shandong New Beiyang Information Technology Co,Ltd,Weihai 264203,China)
出处
《现代雷达》
CSCD
北大核心
2021年第3期30-36,共7页
Modern Radar
基金
国家自然科学基金资助课题(62071144)。
关键词
认知雷达对抗
时变环境
干扰决策
多臂匪徒
最大期望加权
cognitive radar countermeasure
time-varying environment
jamming strategy
multi-armed bandit
maximum expected value weighting