摘要
针对基于强化学习的干扰决策方法存在着收敛速度过慢的问题,在Dyna-Q算法的基础上提出一种规划步数自适应的Dyna-Q干扰决策算法。在保证干扰策略有效性的前提下,提升强化学习算法的收敛速度,使算法能以更快的速度学习到最优干扰策略。实验与仿真结果表明:该算法能实现多功能雷达干扰的实时有效,也可扩展到其他强化学习应用领域,具有一定借鉴价值。
Aiming at the problem of slow convergence speed of jamming decision method based on reinforcement learning, a jamming decision algorithm with selfadaptive planning steps based on Dyna-Q algorithm is proposed. On the premise of ensuring the effectiveness of the jamming strategy, the convergence speed of the reinforcement learning algorithm is improved, so that the algorithm can learn the optimal jamming strategy at a faster speed. The experimental and simulation results show that the algorithm can realize the real-time and effective jamming of multi-function radar, and can also be extended to other reinforcement learning applications, which has a certain reference value.
作者
朱霸坤
朱卫纲
李伟
李佳芯
杨莹
Zhu Bakun;Zhu Weigang;Li Wei;Li Jiaxin;Yang Ying(Department of Electronic and Optical Engineering,Space Engineering University,Beijing 101416,China)
出处
《兵工自动化》
2022年第7期1-4,共4页
Ordnance Industry Automation
基金
复杂电磁环境效应国家重点实验室项目(2020Z0203B)。
关键词
多功能雷达
干扰决策
强化学习
Dyna-Q
自适应
multi-functional radar
jamming decision
reinforcement learning
Dyna-Q
selfadaptive