基于知识辅助深度强化学习的巡飞弹组动态突防决策

Dynamic Penetration Decision of Loitering Munition Group Based on Knowledge-assisted Reinforcement Learning

下载PDF

导出

摘要巡飞弹组(Loitering Munition Group,LMG)突防控制决策是提高巡飞弹群组作战自主性与智能性的关键。针对存在截击拦截器和临机防空火力区的动态环境中弹组突防机动指令在线生成困难的问题,提出一种基于知识辅助强化学习方法的LMG突防控制决策算法。结合领域知识、规则知识改进状态空间和回报函数设计提高算法泛化能力与训练收敛速度。构建基于软动作-评价方法的LMG突防控制决策框架,以提高算法探索效率。利用专家经验和模仿学习方法改善多弹多威胁带来的解空间狭窄、算法初始高效训练经验匮乏的问题。实验结果表明,新算法能够在动态环境中实时生成有效的突防机动指令,相较于对比方法效果更好,验证了算法的有效性。 The loitering munition group penetration control decision(LMGPCD)is the key to improve the autonomy and intelligence of loitering munition group combat.A knowledge-assisted reinforcement learning-based LMGPCD algorithm is proposed to solve the issue due to the difficult online generation of penetration maneuver command for loitering munition group in the dynamic environment containing interceptors and air defenses.The state space and reward function are improved by domain knowledge and rule knowledge to enhance the generalization ability and training convergence speed of the algorithm.A LMGPCD decision framework based on the soft actor-critic(SAC)algorithm is constructed to increase the exploration efficiency of the algorithm.An expert experience applying and imitation learning method is utilized against the lacking of initial efficient training experience for the algorithm due to the narrow solution space caused by increasing number of missiles and threats.The experimental results show that the proposed algorithm can generate more effective penetration maneuver command in real time in a dynamic environment compared to other algorithm,which verifies the effectiveness of the proposed algorithm.

作者孙浩黎海青梁彦马超雄吴翰 SUN Hao;LI Haiqing;LIANG Yan;MA Chaoxiong;WU Han(School of Automation,Northwestern Polytechnical University,Xi'an 710072,Shaanxi,China;Xi'an Modern Control Technology Research Institute,Xi'an 710065,Shaanxi,China)

机构地区西北工业大学自动化学院西安现代控制技术研究所

出处《兵工学报》 EI CAS CSCD 北大核心 2024年第9期3161-3176,共16页 Acta Armamentarii

基金国家自然科学基金项目(61873205)。

关键词巡飞弹组知识辅助深度强化学习 Soft Actor-Critic算法动态环境突防控制决策 loitering munition group knowledge-assisted deep reinforcement learning soft actor-critic algorithm dynamic environment penetration control decision

分类号 V279 [航空宇航科学与技术—飞行器设计]

引文网络
相关文献

1Daniel Zampronha,Aline Albuquerque.Cheaper Precision Weapons: An Exploratory Study about the HESA Shahed 136[J].Advances in Aerospace Science and Technology,2024,9(1):40-59.
2高卓.往事并不如烟战火中进化的武装直升机[J].航空知识,2024(5):48-53.
3于三挝.空袭乌克兰的“带翼炸弹”——俄罗斯UMPK滑翔制导组件[J].兵器,2024(9):18-21.
4杨爽,王小根.促进群体凝聚力的协作知识建构活动研究[J].软件导刊,2024,23(3):184-189.
5万常选,张奕韬,刘德喜,刘喜平,廖国琼,万齐智.主题方面共享的领域主题层次模型[J].软件学报,2024,35(4):1790-1818.
6武天才,王宏伦,任斌,刘一恒,吴星雨,严国乘.考虑规避与突防的高超声速飞行器智能容错制导控制一体化设计[J].航空学报,2024,45(15):273-293.
7袁春华,冯博文,李翔宇.基于可解释性USL的抽油井电功率异常检测[J].沈阳理工大学学报,2024,43(5):41-48.
8于浩.小微型无人机的八大优势[J].中国人民防空,2024(7):74-75.
9黄燕清.浅析小学语文作文教学中仿写训练的指导技巧[J].国家通用语言文字教学与研究,2024(7):129-131.
10乔和,李增辉,刘春,胡嗣栋.基于改进好奇心的深度强化学习方法[J].计算机应用研究,2024,41(9):2635-2640.

兵工学报

2024年第9期

浏览历史

内容加载中请稍等...

基于知识辅助深度强化学习的巡飞弹组动态突防决策

相关作者

相关机构

相关主题

浏览历史