Recorded recurrent deep reinforcement learning guidance laws for intercepting endoatmospheric maneuvering missiles

下载PDF

导出

摘要 This work proposes a recorded recurrent twin delayed deep deterministic(RRTD3)policy gradient algorithm to solve the challenge of constructing guidance laws for intercepting endoatmospheric maneuvering missiles with uncertainties and observation noise.The attack-defense engagement scenario is modeled as a partially observable Markov decision process(POMDP).Given the benefits of recurrent neural networks(RNNs)in processing sequence information,an RNN layer is incorporated into the agent’s policy network to alleviate the bottleneck of traditional deep reinforcement learning methods while dealing with POMDPs.The measurements from the interceptor’s seeker during each guidance cycle are combined into one sequence as the input to the policy network since the detection frequency of an interceptor is usually higher than its guidance frequency.During training,the hidden states of the RNN layer in the policy network are recorded to overcome the partially observable problem that this RNN layer causes inside the agent.The training curves show that the proposed RRTD3 successfully enhances data efficiency,training speed,and training stability.The test results confirm the advantages of the RRTD3-based guidance laws over some conventional guidance laws.

作者 Xiaoqi Qiu Peng Lai Changsheng Gao Wuxing Jing

机构地区 Department of Aerospace Engineering Shanghai Electro-Mechanical Engineering Institute

出处《Defence Technology（防务技术）》 SCIE EI CAS CSCD 2024年第1期457-470,共14页 Defence Technology

基金 supported by the National Natural Science Foundation of China(Grant No.12072090)。

关键词 Endoatmospheric interception Missile guidance Reinforcement learning Markov decision process Recurrent neural networks

分类号 TP18 [自动化与计算机技术—控制理论与控制工程] TJ765.3 [兵器科学与技术—武器系统与运用工程] TJ761.7 [兵器科学与技术—武器系统与运用工程]

引文网络
相关文献

参考文献1

1Haizhao LIANG,Jianying WANG,Yonghai WANG,Linlin WANG,Peng LIU.Optimal guidance against active defense ballistic missiles via differential game strategies[J].Chinese Journal of Aeronautics,2020,33(3):978-989. 被引量：12

二级参考文献4

1张运喜,孙明玮,陈增强.滑模变结构有限时间收敛制导律[J].控制理论与应用,2012,29(11):1413-1418. 被引量：30
2Naiming QI,Qilong SUN,Jun ZHAO.Evasion and pursuit guidance law against defended target[J].Chinese Journal of Aeronautics,2017,30(6):1958-1973. 被引量：5
3Dong YE,Mingming SHI,Zhaowei SUN.Satellite proximate interception vector guidance based on differential games[J].Chinese Journal of Aeronautics,2018,31(6):1352-1361. 被引量：6
4SUN Qilong,QI Naiming,XIAO Longxu,LIN Haiqi.Differential game strategy in three-player evasion and pursuit scenarios[J].Journal of Systems Engineering and Electronics,2018,29(2):352-366. 被引量：9

共引文献11

1王雨琪,宁国栋,王晓峰,郝明瑞,王江华.基于微分对策的临近空间飞行器机动突防策略[J].航空学报,2020(S02):69-78. 被引量：11
2LI Yue,HE Lei,XIA Qunli.Line-of-sight rates extraction of roll-pitch seeker under anti-infrared decoy state[J].Journal of Systems Engineering and Electronics,2021,32(1):178-196.
3张浩,张奕群,张鹏飞.拦截主动防御目标的微分对策制导律[J].系统工程与电子技术,2021,43(5):1335-1345. 被引量：7
4Ziyan CHEN,Jianglong YU,Xiwang DONG,Zhang REN.Three-dimensional cooperative guidance strategy and guidance law for intercepting highly maneuvering target[J].Chinese Journal of Aeronautics,2021,34(5):485-495. 被引量：8
5李万礼,李炯,雷虎民,骆长鑫,李世杰.基于滑模变结构制导律的捕获区分析[J].系统工程与电子技术,2021,43(11):3321-3329. 被引量：5
6王安吉,曹菲,许剑锋,秦建强,薛春岭.基于地形遮蔽的自旋弹测高雷达回波建模与仿真[J].系统工程与电子技术,2021,43(10):2828-2835.
7Ruiping JI,Yan LIANG,Linfeng XU,Zhenwei WEI.Trajectory prediction of ballistic missiles using Gaussian process error model[J].Chinese Journal of Aeronautics,2022,35(1):458-469. 被引量：3
8陆浩然,郑伟,常晓华.基于鲁棒精确微分器的分数阶滑模制导律设计[J].系统工程与电子技术,2023,45(1):175-183.
9花文华,张金鹏.纯方位量测下增强可观测性的微分对策制导律[J].哈尔滨工业大学学报,2023,55(4):122-129. 被引量：2
10余昕宇,王晓芳,林海.导弹摆脱距离可控的最优突防制导律[J].宇航学报,2023,44(7):1053-1063.

1HUANG Jingshuai,ZHANG Hongbo,TANG Guojian,BAO Weimin.Extended differential geometric guidance law for intercepting maneuvering targets[J].Journal of Systems Engineering and Electronics,2018,29(5):1046-1057. 被引量：3
2Jian-dong Zhang,Yi-fei Yu,Li-hui Zheng,Qi-ming Yang,Guo-qing Shi,Yong Wu.Situational continuity-based air combat autonomous maneuvering decision-making[J].Defence Technology（防务技术）,2023,29(11):66-79.
3Nobuyuki Tamura.Analysis of a POMDP Model for an Optimal Maintenance Problem with Multiple Imperfect Repairs[J].American Journal of Operations Research,2023,13(6):133-146.
4Siqing Sun,Defu Cai,Hai-Tao Zhang,Ning Xing.Reinforcement Learning-Based MAS Interception in Antagonistic Environments[J].IEEE/CAA Journal of Automatica Sinica,2024,11(1):270-272.
5Yajing GUO,Xiujuan LEI,Lian LIU,Yi PAN.circ2CBA: prediction of circRNA-RBP binding sites combining deep learning and attention mechanism[J].Frontiers of Computer Science,2023,17(5):217-225. 被引量：1
6Shaobo WANG,Yang GUO,Shicheng WANG,Lixin WANG,Yanhua TAO,Zhengfei PENG.Capurability analysis for arbitrarily high-speed maneuvering targets[J].Chinese Journal of Aeronautics,2023,36(10):375-390.
7Jiawei Xia,Yasong Luo,Zhikun Liu,Yalun Zhang,Haoran Shi,Zhong Liu.Cooperative multi-target hunting by unmanned surface vehicles based on multi-agent reinforcement learning[J].Defence Technology（防务技术）,2023,29(11):80-94.
8Sébastien Henry,Roberto Armellin,Thibault Gateau.Safe-event pruning in spacecraft conjunction management[J].Astrodynamics,2023,7(4):401-413. 被引量：1
9王增福,杨广宇,金术玲.考虑综合性能最优的非短视快速天基雷达多目标跟踪资源调度算法[J].雷达学报（中英文）,2024,13(1):253-269.
10Yijun Chen,Bo Yang,Zhengxun Guo,Jingbo Wang,Mengmeng Zhu,Zilin Li,Tao Yu.Dynamic reconfiguration for TEG systems under heterogeneous temperature distribution via adaptive coordinated seeker[J].Protection and Control of Modern Power Systems,2022,7(1):567-585.

Defence Technology（防务技术）

2024年第1期

浏览历史

内容加载中请稍等...

Recorded recurrent deep reinforcement learning guidance laws for intercepting endoatmospheric maneuvering missiles

参考文献1

二级参考文献4

共引文献11

相关作者

相关机构

相关主题

浏览历史