摘要
火控雷达(FCR)工作时常常面临转发式干扰的挑战,考虑二者间多阶段对抗场景,针对未知环境模型下雷达波束多阶段管理问题,提出了一种基于无模型强化学习的波束驻留时间优化方法。首先,建立了未知环境模型下的马尔可夫决策过程,用于多阶段波束驻留时间优化,为了评价雷达探测的性能,以FCR对目标锁定时间的期望为评价标准;然后,为克服未知环境模型的挑战,提出了一种面向多阶段波束驻留时间优化的强化学习框架,并在此基础上提出了一种基于Q学习的驻留时间优化方法;最后,通过数值仿真验证了该方法的有效性。
Repeater jamming is often the challenge faced by fire control radar(FCR). Considering the multi-stage confrontation scenario, a dwell time optimization method based on model-free reinforcement learning is proposed to handle the problem of radar beam multi-stage management with unknown environment model. Firstly, a Markov decision process with unknown environment model is built for multi-stage dwell time optimization. To evaluate the performance of radar detection, the expectation of the search to lock-on time of the FCR is selected as an evaluation criterion. Then, to overcome the challenge of the unknown environment model, a reinforcement learning framework for multi-stage dwell time optimization is formulated. According to the framework, a method of multi-stage dwell time optimization based on Q-learning is proposed. Finally, numerical results are provided to verify the validity of the proposed method.
作者
马智杰
王远航
姜家财
张天贤
MA Zhijie;WANG Yuanhang;JIANG Jiacai;ZHANG Tianxian(School of Information and Communication Engineering,University of Electronic Science and Technology of China,Chengdu 611731,China;No.10 Research Institute of China Electronics Technology Group Corporation,Chengdu 610036,China)
出处
《现代雷达》
CSCD
北大核心
2022年第11期44-50,共7页
Modern Radar
基金
国家自然科学基金资助项目(61971109)
国防科技创新特区支持项目(重点项目)
中央高校基本科研业务费资助项目(ZYGX2018J009)。
关键词
雷达波束管理
多阶段驻留时间优化
未知环境模型
Q学习
radar beam management
multi-stage dwell time optimization
unknown environment model
Q-learning